Memory Controller For Micro-Threaded Memory Operations

ABSTRACT

A micro-threaded memory device. A plurality of storage banks are provided, each including a plurality of rows of storage cells and having an access restriction in that at least a minimum access time interval must transpire between successive accesses to a given row of the storage cells. Transfer control circuitry is provided to transfer a first amount of data between the plurality of storage banks and an external signal path in response to a first memory access request, the first amount of data being less than a product of the external signal path bandwidth and the minimum access time interval.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/486,068, filed on Apr. 12, 2017 for “Micro-Threaded Memory” on behalfof inventors Frederick A. Ware, Craig E. Hampel, Wayne S. Richardson,Chad A. Bellows and Lawrence Lai, which in turn is a continuation ofU.S. patent application Ser. No. 14/449,610, filed on Aug. 1, 2014 for“Micro-Threaded Memory” on behalf of inventors Frederick A. Ware, CraigE. Hampel, Wayne S. Richardson, Chad A. Bellows and Lawrence Lai (nowU.S. Pat. No. 9,652,176), which in turn is a continuation of U.S. patentapplication Ser. No. 13/901,014, filed on May 23, 2013 for“Micro-Threaded Memory” on behalf of inventors Frederick A. Ware, CraigE. Hampel, Wayne S. Richardson, Chad A. Bellows and Lawrence Lai (nowU.S. Pat. No. 9,292,223), which in turn is a continuation of U.S. patentapplication Ser. No. 10/998,402, filed on Nov. 29, 2004 for“Micro-Threaded Memory” on behalf of inventors Frederick A. Ware, CraigE. Hampel, Wayne S. Richardson, Chad A. Bellows and Lawrence Lai (nowU.S. Pat. No. 8,595,459). Each aforementioned patent and/or applicationis hereby incorporated by reference.

FIELD OF THE INVENTION

The present invention relates generally to memory systems and componentsthereof.

BACKGROUND

Although dynamic random access memory (DRAM) remains the memory ofchoice for a broad class of computing and consumer electronicsapplications, DRAM core access times have not scaled with memorybandwidth demand. For example, the minimum time between activation ofdifferent storage rows in the same storage bank, t_(RC), remains in theneighborhood of 40 nanoseconds for predominant core technologies; asubstantial access time penalty for processors operating at gigahertzfrequencies. Other core access times such as the minimum time betweenactivation of rows in different banks of a multi-bank array, t_(RR), andminimum time between column access operations (i.e., read or writeoperations at a specified column address) in the same row, t_(CC), havealso been slow to improve.

Designers have countered core timing limitations through a number ofarchitectural and system-level developments directed at increasing thenumber of column access operations per row activation (e.g., paging,multi-bank arrays, prefetch operation), and maximizing the amount ofdata transferred in each column access. In particular, signaling rateadvances have enabled progressively larger amounts of data to betransferred per column access, thereby increasing peak memory bandwidth.However, as signaling rates progress deeper into the gigahertz range andthe corresponding core access times remain relatively constant, columntransaction granularity, the amount of data transferred per columnaccess, is forced to scale upwards and is approaching limits imposed bysignal paths within the DRAM itself. Further, the trend in some classesof data processing applications, graphics applications for example, istoward smaller data objects (e.g., triangle fragments of a 3D scene)that are often stored in dispersed memory locations. In suchapplications, the additional power and resources expended to increasethe column transaction granularity may provide only limited increase ineffective memory bandwidth as much of the fetched data may not be used.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by wayof limitation, in the figures of the accompanying drawings and in whichlike reference numerals refer to similar elements and in which:

FIG. 1 illustrates partitioning of a data transfer interval and datatransfer path to enable reduced column transaction granularity;

FIG. 2 illustrates an embodiment of a memory device in whichmicro-threaded column operations may be performed;

FIG. 3 illustrates more detailed embodiments of sub-banks, columndecoders, and data interfaces that may be used within the memory deviceof FIG. 2;

FIG. 4 illustrates a single-threaded mode of operation within the memorydevice 100 of FIG. 2;

FIG. 5 illustrates an embodiment of a request interface that may be usedwithin the memory device of FIG. 2 to enable single-threaded andmicro-threaded memory transactions;

FIG. 6 illustrates an exemplary timing of row and column strobe signalassertions by the request decoder of FIG. 5 when the memory device ofFIG. 2 is operating in a single-threaded mode;

FIGS. 7 and 8 illustrate a memory device and an exemplary sequence ofmicro-threaded memory transactions that may be performed in the memorydevice when operated in a micro-threaded mode;

FIG. 9 illustrates an exemplary timing of register strobe signalassertions by the request decoder of FIG. 5;

FIG. 10 illustrates exemplary link-staggered micro-threaded memorytransactions that may be performed in an alternative embodiment of thememory device of FIG. 2;

FIG. 11 illustrates an alternative link-staggered data transfer modethat may be used in other memory device embodiments;

FIGS. 12A and 12B illustrate exemplary data path interfaces and that maybe used to support the time-staggered and link staggered data transfersshown in FIGS. 8 and 10;

FIGS. 13 and 14 illustrate a memory device and an exemplary sequence ofmicro-threaded memory transactions that may be performed in the memorydevice when operated in an alternative micro-threaded mode;

FIG. 15 illustrates an embodiment of a request interface that may beused within the memory device 100 of FIG. 2 to enable the micro-threadedmemory transactions described in reference to FIGS. 13 and 14;

FIGS. 16 and 17 illustrate a memory device and an exemplary sequence ofmicro-threaded memory operations in which separate row and columnaddresses are used to access sub-banks in each of four storage bankquadrants of the memory device within a single t_(CC) interval;

FIG. 18 illustrates an embodiment of a request interface that may beincluded within the memory device of FIG. 16;

FIG. 19 illustrates an exemplary timing of control signal assertions bythe request decoder of FIG. 18;

FIGS. 20A and 20B illustrate exemplary row request formats;

FIGS. 21A and 21B illustrate exemplary column request formats;

FIGS. 22 and 23 illustrate a memory device having a request interfaceand data path interface to interface with legacy request and data paths,and an exemplary sequence of micro-threaded memory operations in thememory device;

FIG. 24 illustrates a more detailed example of address informationprovided via the request path shown in FIG. 23;

FIG. 25 illustrates exemplary configuration information that may beprovided in conjunction with a load mode register command issued to thememory device of FIG. 22;

FIGS. 26 and 27 illustrate a memory device having the data pathinterface described in reference to FIGS. 22 and 23, and an exemplarysequence of four-by-four micro-threaded memory operations in the memorydevice;

FIG. 28 illustrates an exemplary timing signal arrangement that may beused to convey a fourth bank address bit to the memory device of FIG.26; and

FIG. 29 illustrates an embodiment of a memory system that includes amemory controller and at least one micro-threaded memory device.

DETAILED DESCRIPTION

In the following description and in the accompanying drawings, specificterminology and drawing symbols are set forth to provide a thoroughunderstanding of the present invention. In some instances, theterminology and symbols may imply specific details that are not requiredto practice the invention. For example, the interconnection betweencircuit elements or circuit blocks may be shown or described asmulti-conductor or single-conductor signal lines. Each of themulti-conductor signal lines may alternatively be single-conductorsignal lines, and each of the single-conductor signal lines mayalternatively be multi-conductor signal lines. Signals and signalingpaths shown or described as being single-ended may also be differential,and vice-versa. Similarly, signals described or depicted as havingactive-high or active-low logic levels may have opposite logic levels inalternative embodiments. As another example, circuits described ordepicted as including metal oxide semiconductor (MOS) transistors mayalternatively be implemented using bipolar technology or any othertechnology in which a signal-controlled current flow may be achieved.With respect to terminology, a signal is said to be “asserted” when thesignal is driven to a low or high logic state (or charged to a highlogic state or discharged to a low logic state) to indicate a particularcondition. Conversely, a signal is said to be “deasserted” to indicatethat the signal is driven (or charged or discharged) to a state otherthan the asserted state (including a high or low logic state, or thefloating state that may occur when the signal driving circuit istransitioned to a high impedance condition, such as an open drain oropen collector condition). Multi-level signaling in which eachtransmitted symbol conveys more than one bit of information (i.e., bitrate is greater than baud rate) may also be used. A signal drivingcircuit is said to “output” a signal to a signal receiving circuit whenthe signal driving circuit asserts (or deasserts, if explicitly statedor indicated by context) the signal on a signal line coupled between thesignal driving and signal receiving circuits. A signal line is said tobe “activated” when a signal is asserted on the signal line, and“deactivated” when the signal is deasserted. Additionally, the prefixsymbol “/” attached to signal names indicates that the signal is anactive low signal (i.e., the asserted state is a logic low state). Aline over a signal name (e.g., ‘<signal name>’) is also used to indicatean active low signal. The term “terminal” is used to mean a point ofelectrical connection. The term “exemplary” is used to express anexample, not a preference or requirement.

In embodiments described herein, the data transfer capacity of a dynamicrandom access memory (DRAM) device over a given t_(CC) interval, ametric referred to herein as a t_(CC) envelope, is subdivided andallocated to multiple column access transactions, thereby reducing theamount of data transferred in any one transaction, yet maintaining thepeak memory bandwidth of the DRAM device. Referring to FIG. 1, forexample, instead of following a conventional, single-threaded approachof dedicating each t_(CC) envelope to a single column access transactionas shown at 90, the t_(CC) envelope is partitioned to enable transfer ofmultiple smaller sets of data in response to multiple micro-threadedcolumn requests as shown at 92. By reducing the column transactiongranularity in this manner, the effective bandwidth of the DRAM may besubstantially increased over the single-threaded approach as multipledata objects of interest may be specifically addressed in differentmicro-threaded column access transactions and returned within a givent_(CC) interval, rather than merely a single data object and itspotentially superfluous neighboring data.

In one embodiment, the t_(CC) envelope is partitioned temporally into aset of partial t_(CC) intervals (t_(CCp)) that are allocated respectivemicro-threaded column transactions. In another embodiment, the t_(CC)envelope is spatially partitioned, with data path resources between thememory device core and a host device (e.g., a memory controller) beingsubdivided and allocated to different micro-threaded columntransactions. Referring again to FIG. 1, for example, the data links,DQ, that form an external data path between a memory device and memorycontroller may be partitioned into two or more subsets of data links,DQ_(p), that are allocated to different micro-threaded column accesstransactions. In other embodiments, both temporal and spatialpartitioning is applied to further reduce the column transactiongranularity. In FIG. 1, for example, the t_(CC) envelope is partitionedtemporally into two partial t_(CC) intervals, t_(CCp), and spatiallyinto two subsets of data links, DQp, thereby reducing the micro-threadedcolumn transaction granularity to one-fourth of the single-threadedcolumn transaction granularity. The t_(CC) envelope may be subdividedinto more or fewer spatial and/or temporal partitions in otherembodiments. Also, the multiple micro-threaded column access requestsserviced in a partitioned t_(CC) interval may be directed to an openpage of the same storage bank, to open pages of different storage banks,or any combination thereof. Also, in one embodiment, more denselypipelined row operations are used to enable sub-banks within apartitioned bank architecture to be separately addressed, in effect,increasing the number of banks within the memory device and enablingeach of multiple micro-threaded column accesses, serviced within thesame t_(CC) interval, to be directed to an independently selected bank,row and column.

Overview of an Exemplary Micro-Threaded Memory Architecture

FIG. 2 illustrates an embodiment of a memory device 100 in which varioustypes of micro-threaded column operations may be performed. The memorydevice 100 is assumed, for purposes of description, to be a DRAM device,but may alternately be any type of memory device having multiple storagearrays that share addressing and/or data path resources in a manner thatimposes timing constraints on sequential accesses directed to thedifferent storage arrays. Thus, t_(CC) constraints and t_(RR)constraints herein may alternatively be other types of memory accessconstraints that, together with signaling path bandwidth, definecorresponding data transfer envelopes. For example, the t_(CC) envelopepartitioning described herein is intended as an instance of more generalpartitioning of any data transfer envelope defined by anresource-imposed time constraint and signaling path bandwidth. Also,with regard to the timing constraint itself, such constraint may bedefined to be a minimum time necessary to avoid resource conflictswithin the memory device (e.g., to ensure proper operation), plus anoptional tolerance time to account for statistical variation in the timerequired for circuits and/or signal paths to reach a desired state. Atiming constraint may also be enforced or otherwise defined by a timingsignal such as a clock signal or strobe signal and thus may be expressedas minimum number of transitions of such timing signal that are totranspire between back-to-back operations in order to avoid resourceconflicts within the memory device.

In the particular example shown, the memory device 100 includes arequest interface 101, column decoders 103 ₀-103 ₃, row decoders 113₀-113 ₃, data path interfaces 105A and 105B and eight storage banks,B0-B7. Each storage bank, B0-B7, is formed by a pair of A and Bsub-banks (e.g., sub-banks B0-A and B0-B constitute bank B0, sub-banksB1-A and B1-B constitute bank B1, and so forth), with the sub-banksthemselves being organized in four groups of four sub-banks each,referred to herein as quadrants. The four quadrants are designated Q0-Q3in FIG. 2. In the embodiment of FIG. 2, sub-banks in the same quadrantshare a row decoder and column decoder, and are all either A or Bsub-banks from either the even bank set or odd bank set. For example,quadrant Q0 includes even-numbered, group A sub-banks (i.e., B0-A, B2-A,B4-A and B6-A) coupled to column decoder 103 ₀ via respective columnpaths (collectively designated 117 ₀) and to row decoder 113 ₀ viarespective sets of word lines (collectively designated 115 ₀). In theremaining quadrants, Q1 includes odd-numbered, group A sub-banks coupledto column decoder 103 ₁ via column paths 117 ₁ and to row decoder 113 ₁via word lines 115 ₁; Q2 includes even-numbered, group B sub-bankscoupled column decoder 103 ₂ via column paths 117 ₂ and to row decoder113 ₂ via word lines 115 ₂; and Q3 includes odd-numbered, group Bsub-banks coupled column decoder 103 ₃ via column paths 117 ₃ and to rowdecoder 113 ₃ via word lines 115 ₃. While the architecture of memorydevice 100 is carried forward in descriptions of various memory deviceembodiments described herein, in all cases such memory devices may haveany number of banks, any number of sub-banks per bank, and any number ofsub-banks per decoder-sharing group.

When operated as part of a memory system, the request interface 101 ofmemory device 100 receives a stream of requests (or commands orinstructions) from a memory controller or other host device via arequest path (not shown), and issues corresponding control and addresssignals to the row decoders 113 and column decoders 103 to carry out therequested operations. As a matter of terminology, the term “request” isused herein to mean a request, command or instruction issued to thememory device 100 to cause the memory device to take an action specifiedin the request or by the context in which the request is received. Theaction taken by the memory device in response to a given request isreferred to as an operation, examples of which include row activationoperations, column access operations (which may be read or writeaccesses) and precharge operations. A request and its correspondingoperation are referred to collectively herein as a transaction. Also,some transactions may include multiple component requests andoperations. In the case of a DRAM device, for example, a complete dataaccess transaction may be specified by a row-activation request, one ormore column access requests and a precharge request. A row activationrequest is directed to a bank and row of the memory device 100 (e.g.,specified by bank and row addresses included with the request) and isserviced by enabling the contents of the row to be output onto the bitlines of the bank and thereby transferred into a page buffer (e.g., astorage structure formed by latching sense amplifiers coupledrespectively to the bit lines). Column access requests are directed to abank and column of the memory device 100 (e.g., specified by bank andcolumn addresses included with the request) and are serviced by readingor overwriting data in column-address-specified sub-fields (columns)within the page buffer for the specified bank. After the column accessesdirected to an open page (i.e., page buffer content) are completed, aprecharge operation may be carried out to precharge the bit lines of thesubject bank in preparation for subsequent row activation.

When a row activation request is received, the request interface 101recovers bank and row address values from the request and forwards theaddress values via signal paths 111 ₁-111 ₃ to the row decoders 113₀-113 ₃ for the bank-address-specified quadrants. In one embodiment,each of the row decoders 113 includes a first stage decoder to selectthe set of word lines coupled to a bank-address-specified sub-bank, anda second stage decoder to activate a row-address-specified word linewithin the selected set of word lines, thereby enabling the contents ofthe cells coupled to the activated word line onto the bit lines of theselected sub-bank. In other embodiments, the bank and row decodingoperation may be carried out in more or fewer decoder stages. Also, oneor more row address strobe signals may be issued by the requestinterface 101 or other control logic to control the timing of the wordline activation.

When a column access request is received in the memory device 100, therequest interface 101 recovers bank and column address values from therequest and forwards the address values to the column decoders 103 forthe quadrants specified by the bank address. In one embodiment, each ofthe column decoders 103 includes a bank multiplexer to enable access tothe page buffer for a bank-address indicated sub-bank (i.e., via aselected one of column access paths 117), and a column multiplexer toselect a column of page buffer storage elements for read or writeaccess. Other circuit arrangements may be used resolve the column accesslocation in alternative embodiments. Also, one or more column addressstrobe signals may be issued by the request interface 101 or othercontrol logic in the memory device 100 to control the timing of thecolumn access.

After column operations in the open page are completed (e.g., read orwrite operations carried out in response to corresponding column accessrequests), a precharge request and associated bank address may bereceived in the request interface 101 and serviced by deactivating apreviously activated word line and precharging the bit lines for thespecified bank. After the precharge operation is complete, the specifiedbank is in condition for another row activation operation.

Still referring to FIG. 2, each of the data path interfaces 105A, 105Bis coupled to respective pairs of column decoders 103 via column datapaths 119 to enable transfer of read and write data between the columndecoders and an external data path (not shown). More specifically, datapath interface 105A enables data transfer between a first portion of theexternal data path and selected sense amplifiers within a page buffer(i.e., selected by column decoder 103 ₀ and 103 ₁) via column data paths119 ₀ and 119 ₁, and data path interface 105B enables data transferbetween a second portion of the external data path and selected senseamplifiers within a page buffer (i.e., selected by column decoders 103 ₂and 103 ₃) via column data paths 119 ₂ and 119 ₃.

FIG. 3 illustrates more detailed embodiments of the Q0 and Q1 sub-banks(i.e., sub-banks B0-A through B7-A), column decoders 103 ₀/103 ₁, anddata interface 105A that may be used within the memory device 100 ofFIG. 2. The Q2 and Q3 sub-banks, column decoders 103 ₂/103 ₃ and datainterface 105B may be implemented in embodiments similar or identical tothose depicted in FIG. 3 and therefore are not separately described.

In the embodiment of FIG. 3, each of the sub-banks B0-A to B7-A includesa storage array 145 and page buffer 147 coupled to one another via bitlines 169. Referring to detail view 165, the storage array 145 is formedby memory cells 170 arranged in rows and columns. Each column of memorycells 170 is coupled via a bit line 169 to a respective sense amplifier168 within page buffer 147, and each row of memory cells is coupled viaa word line 166 to a row decoder 113 (or component thereof). In theparticular embodiment shown, each memory cell 170 is a DRAM memory cellformed by a transistor switch (e.g., having a drain coupled to bit line169 and a gate coupled to word line 166) and a capacitive storageelement coupled between the source of the transistor switch and a cellplate reference or other reference node. Memory cells of other types andconfigurations may be used in alternative embodiments. As discussedabove, during a row activation operation, a word line is activated, andcontents of the storage cells 170 coupled to the word line (i.e., theword-line selected row) are enabled onto the bit lines 169 and therebytransferred to the page buffer 147. During a precharge operation, theopen page (i.e., content of the page buffer) is closed and the bit lines169 precharged in preparation for activation of another row. Refreshoperations may be performed in rows of a given sub-bank or set ofsub-banks through combinations of activation and precharge operations.

As discussed above, the column decoders 103 ₀ and 103 ₁ enable columnoperations (i.e., read or write operations) directed to open pageswithin the sub-banks of quadrants Q0 and Q1, respectively. In theembodiment of FIG. 3, each of the column decoders 103 ₀, 103 ₁ includesa set of four column multiplexers 149, one for each sub-bank of thecorresponding quadrant, and a bank multiplexer 151. The bank multiplexer151 enables access to (i.e., selects) one of the column multiplexers 149in response to a bank address, or at least the most significant bits(MSBs) or other subset of bits within the bank address. Referring todetail view 165, the selected column multiplexer 149 includes a set ofmultiplexer/demultiplexer circuits 164 that each enable read or writeaccess to a respective column-address-selected column of storageelements within the page buffer 147 (column, for short); an operationreferred to herein as a column access. In the case of a write operation,the bank multiplexer 151 and column multiplexer 149 performdemultiplexing functions, routing data from the data path interface 105Ato the selected column. In a read operation, the bank multiplexer 151and column multiplexer 149 perform a multiplexing function by routingdata from the selected column to the data interface 105A. Note that thebank multiplexer 151 and column multiplexers 149 may be interchanged inan alternative embodiment, so that the bank multiplexer 151 is coupledto the sub-bank bit lines 169 and a single column multiplexer is coupledbetween the output of the bank multiplexer 151 and the data pathinterface 105A.

In the embodiment of FIG. 3, the data path interface 105A includes apair of serdes transceivers 173 (i.e., serializing/deserializingtransceivers that perform a parallel-to-serial conversion of outgoingdata and a serial-to-parallel conversion of incoming data) each coupledbetween an external data path interface 171 and a respective one ofcolumn decoders 103 ₀ and 103 ₁. Each serdes transceiver 173 includes adata serializer 177, transmitter 175, data deserializer 181 and receiver179. In a read operation, the data serializer 177 performs amultiplexing function by receiving a 128-bit read data value (i.e.,column data) from the column decoder 103 via column data path 119 anddelivering the read data to the transmitter 175 in the form of a sixteenbyte stream. The transmitter 175, in turn, transmits each byte via theexternal data path interface 171 in successive data transmissionintervals. The receiver 179 and data deserializer 181 perform theinverse functions of the data serializer 177 and transmitter 175. Thatis, the receiver 179 samples signals arriving via the external data pathinterface 171 during successive data reception intervals to deliver astream of sixteen bytes to the data deserializer 181. The datadeserializer 181 performs a demultiplexing function by gathering theincoming stream of bytes into a 128-bit data write data value to bedelivered to the corresponding column decoder 103 via column data path119. Although an 8-bit external data path interface 171 and a 128-bitcolumn data path 119 are shown, different path widths may be used inalternative embodiments. In alternative embodiments in which datamultiplexing/demultiplexing is unnecessary (e.g., external data pathwidth matches the column data size of the memory device), the dataserializer 177 and/or data deserializer 181 may be omitted.

Still referring to FIG. 3, the transmitters 175 may be, for example,current-mode or voltage-mode output drivers for generating outputwaveforms having virtually any amplitude and any type of modulation.Also, the transmitters 175 may generate multi-bit symbols (i.e., bitrate greater than baud rate), perform various encoding operations (e.g.,8b/10b) and/or add error checking information into the outgoingbitstream (e.g., parity bits, error-code correction (ECC) bits,checksum, cyclic-redundancy check values, etc.). The receivers 179,similarly, may be designed to sample current-mode or voltage-modetransmissions modulated in any manner, with each sample being resolvedinto one or more bits according to the number of bits conveyed in eachtransmitted symbol. The receiver may additionally perform decodingoperations and error checking operations. Further, both the transmitterand receiver may be switched between different operating modes and/orfrequencies, for example, operating on multi-bit symbols in one mode andsingle-bit symbols in another mode.

The signaling links that constitute the external data path may bepoint-to-point or multi-drop, differential or single-ended, and may beused to carry synchronous or asynchronous transmissions. The data pathinterface 105A, accordingly, may be a synchronous or asynchronoussignaling interface. In the case of synchronous transmissions, thetransmitted data signals may be self-timed (e.g., carrying clockinginformation within the data waveform) or accompanied by timing signalssuch as one or more clock signals, strobe signals or the like. In thecase of self-timed transmissions, encoding circuitry may be provided inthe transmitter 175 to encode each outgoing bit stream (i.e., the streamtransmitted on any single link of the external data path interface insynchronism with one or more transmit clock signals) to ensuresufficient transition density for clock recovery (e.g., 8b/10bencoding), and corresponding clock-data recovery circuitry and decodingcircuitry may be provided in the receiver 179 to recover clockinginformation and the un-encoded transmitted data. Such clockinginformation, whether received in the form of external timing signals(e.g., clock or strobe signals) or recovered from the incoming bitstream, may be used to control the phase of one or more sampling clocksignals that are supplied to the receiver 179 to trigger sampling of theincoming data signals.

Single-Threaded Mode

FIG. 4 illustrates a single-threaded mode of operation within the memorydevice 100 of FIG. 2 and is provided, in part, for contrast with latermicro-threaded operating modes described below. Referring to requestpath 201 (RQ), a pipelined stream of requests is received over a seriesof request intervals 200, with gray shaded blocks indicating requeststhat form part of a multi-access read transaction 205 directed to aselected bank. The striped blocks represent requests directed to otherbanks. In the exemplary embodiment shown, the request path is operatedat a legacy signaling rate (i.e., lower than maximum supported signalingrate) of 0.8 Gigabits per second (Gb/s) per link. Each command istransferred over a pair of transmit intervals, thereby establishing a2.5 ns request interval. In alternative embodiments, higher or lowersignaling rates may be used, and each request may be conveyed in more orfewer transmit intervals.

The request path 201 is shown as a set of logical pipelines 203A, 203Band 203C to help visualize the relative timing of different types ofrequests included in the multi-access read transaction 205. Pipeline203A is referred to herein as the activation pipeline (RQ-ACT) andcarries row activation requests (e.g., 207), each including a rowactivation command specifier together with bank and row address valuesto identify the specific row to be activated. Pipeline 203B is referredto herein as the column access pipeline (RQ-CA) and carries columnaccess requests (e.g., 209), each including a column access commandspecifier, specifying either a read or write access, together with bankand column address values to identify the bank and column to beaccessed. Pipeline 203C is referred to herein as the precharge pipelineand carries precharge requests (e.g., 213) each including a prechargecommand specifier and bank address to indicate the bank to beprecharged.

The multi-access read transaction is initiated when a row activationrequest 207 directed to row ‘z’ of bank B0 (B0-Rz) is received. Thememory device responds to the row activation request by issuing the bankaddress and row address to the row decoders for bank B0 (i.e., rowdecoders 113 ₀ and 113 ₂). In the particular embodiment shown, theminimum time between row activations in the same bank, t_(Rc), isassumed to be 40 ns so that another row activation request directed tobank B0 is not received until sixteen request intervals later, as shownat 215. Also, the minimum time between row activations in arbitrarilydifferent banks, t_(RR), is assumed to be 10 ns, so that another rowactivation request 217 is not received until four request intervalslater.

The first column access request 209 of two column access requests thatform part of the multi-access read transaction 205 is received apredetermined time after the activation request 207 (five requestintervals later in this example) and specifies a read at column ‘a’ ofthe B0 open page (B0-Ca). As the minimum time between accesses to thesame open page, t_(CC), is assumed to be 5 ns in this example, and thememory device is operating in single-threaded mode, the second of thetwo column access requests, 211, is received two request intervals afterreceipt of the first column access request (i.e., a t_(CC) intervallater), and specifies a read at a different column, column ‘b’, of theB0 open page (B0-Cb). Referring briefly to FIG. 2, the request interface201 responds to the first column access request 209 by delivering bankand column address values via shaded column address paths 109 ₀ and 109₂ to the even bank column decoders 103 ₀ and 103 ₂ which, in turn,retrieve data from column ‘a’ of the B0-A and B0-B sub-banks as shown.The retrieved column ‘a’ data is then delivered to the data interfacevia the shaded column data paths 119 ₀ and 119 ₂. The request interface201 similarly responds to the second column access request 211 bydelivering bank and column address values to the even bank columndecoders 203 ₀ and 203 ₂ which then retrieve data from column ‘b’ of theB0-A and B0-B sub-banks and output the data to the data path interfaces105A and 105B.

Returning to FIG. 4, the column data retrieved in response to the firstcolumn access request 209 is output onto external DQA and DQB signalpaths 225 in a data transfer operation that begins, a predetermined timeafter receipt of the column access request 209. The correspondencebetween the column access request 209 and outgoing (or incoming) columndata is illustrated by lead lines 227 extending from the column accessrequest (i.e., in pipeline 203B) to a like-shaded data transfer oversignal paths 225; a notation used in FIG. 4 and other figures describedbelow. In this single-threaded example, column data is transferred inresponse to the first column access request over the t_(CC) intervalstarting at 215. More specifically, column ‘a’ data is output fromsub-bank B0-A over the DQA links (eight links in this example, thoughmore or fewer links may be provided in alternative embodiments), andcolumn ‘a’ data is output from sub-bank data B0-B over the DQB links,the transfer of column ‘a’ data from the two B0 sub-banks thus consumingthe entire t_(CC) envelope as shown. As discussed above, the t_(CC)envelope is a product of the signaling bandwidth and t_(CC) interval sothat, given a 5 ns t_(CC) interval (a value assumed throughout thefollowing description, though virtually any memory core technologyhaving the same or different t_(CC) constraint may be used) and assuminga 3.2 Gb/s signaling rate in each of the 16 DQ links, the t_(CC)envelope is 32 bytes (i.e., [16 links*3.2 Gb/s/link]*5 ns=256 bits=32bytes). In the single-threaded example shown, each column access requestis serviced per t_(CC) interval so that the column transactiongranularity, CTG, is also 32 bytes. Thus, the column transactiongranularity is coextensive with the t_(CC) envelope.

The column ‘b’ data retrieved in response to the second column accessrequest 211 is output onto the DQA and DQB signal paths over a t_(CC)interval that begins at the conclusion of the column ‘a’ data transferoperation. Thus, like the column ‘a’ transaction that precedes it, thecolumn ‘b’ transaction consumes the entire t_(CC) envelope and thereforehas a column transaction granularity of 32 bytes.

Still referring to FIG. 4, a precharge request 213 is received apredetermined time after the second column access request 211 (threerequest intervals in this example) and includes a precharge commandspecifier (Pre Cmd) and bank address that indicates the bank to beprecharged; B0 in this case. The request interface executes therequested precharge operation a predetermined time later (e.g., afterthe column ‘b’ data has been retrieved from the B0-A/B0-B page buffers)by issuing a disable signal to the row decoders for bank B0, and issuinga precharge-enable signal to the page buffers of sub-banks B0-A and B0-Bto precharge the sub-bank bit lines. The precharge operation thus closesthe page opened in response to the row activation request 207 andtherefore concludes the multi-access read transaction 205. Because eachof the two column access operations yielded 32 byte transfers, the rowtransaction granularity (i.e., amount of data transferred for a givenrow activation, RTG) is 64 bytes. More or fewer column accesstransactions may be performed for a given row activation in alternativeembodiments, yielding correspondingly increased or decreased rowtransaction granularity.

Assuming that a steady stream of row activation requests andcorresponding column access request pairs and precharge requests arereceived via the request path 201, the request path 201 may remain fullyloaded (i.e., no unused request intervals) and the data path 225,similarly, may be fully consumed with the requested data transfers, eachhaving 32-byte column transaction granularities and 64-byte rowtransaction granularities. When the signaling rate on the data path 225is increased to the full rate of the memory device, however, the row andcolumn transaction granularities also increase. For example, in oneembodiment, the full signaling rate supported by the data path interface105 of the FIG. 2 memory device is 6.4 Gb/s (other signaling rates maybe supported in alternative embodiments). Because the t_(CC) interval ofthe memory device remains unchanged, the t_(CC) envelope is doubled to64 bytes and, if the single-threaded approach were followed, the columntransaction granularity and row transaction granularity would alsodouble to 64 bytes and 128 bytes, respectively. To support the increasedcolumn transaction granularity, a number of signal paths within thecolumn decoders and data path interface may need to be increased.Referring to FIGS. 2 and 3, for example, because the 64 byte columntransaction granularity corresponds to a 256-bit (32 byte) column accessin each of the sub-banks, each of the 128-bit signal paths within thecolumn decoders 103 (i.e., between the column multiplexer 149 and bankmultiplexer 151) and the data path interfaces 105A, 105B would expand to256-bit signal paths to support the increased data transfer rate. Whilepotentially realizable, such increased path widths result in increasedmanufacturing cost and power consumption (i.e., the path widths beingincreased in each sub-bank and data interface of the memory device) andthe headroom available headroom for such increases is shrinking. Also,as discussed above, in applications that tend to access small, disperseddata objects, only a small portion of the data returned in a givencolumn access may be useful. For example, a common triangle size inmodern graphics applications is six bytes and, due to rendering order,successively rendered triangles are often unlikely to be acquired in thesame column access. In such applications, doubling the columntransaction granularity from 32 bytes to 64 bytes may provide littleimprovement in effective bandwidth.

FIG. 5 illustrates a request interface that may be used to support asingle-threaded mode of operation within the memory device as well asmicro-threaded modes of operation discussed below. The request interface300 includes a request decoder 301, even-bank row control registers305A, 305B (EBRC), odd-bank row control registers 307A, 307B (OBRC),even-bank column control registers 309A, 309B (EBCC) and odd-bank columncontrol registers 311A, 3116 (OBCC). The even- and odd-bank row controlregisters 305, 307 are coupled to the request decoder 301 via a row bus315, and the even- and odd-bank column control registers 309, 311 arecoupled to the request decoder 301 via a column bus 317.

In one embodiment, incoming symbol streams received via request pads 303(i.e., from an external request path) are deserialized in an optionalrequest deserializer 304 to deliver a corresponding stream of n-bit widerequests to the request decoder 301. (Note that, while pads are referredto in a number of embodiments herein, in all such cases,capacitive-coupling nodes or any other interface to an externalsignaling path may be used.) The incoming requests may include virtuallyany type of requests including, without limitation, the row activationrequests, column access requests and precharge requests discussed above,as well as other requests used, for example, to initiate refreshoperations within one or more storage banks, program operating modeswithin the memory device (e.g., selecting between single-threaded modeand one or more of a number of different micro-threaded modes; andselecting between a number of different refresh modes, power modes,precharge modes, etc.), initiate signaling calibration and/or trainingoperations, initiate self-tests and so forth.

The request decoder 301, which may be implemented by one or more statemachines, microsequencers and/or other control circuitry, decodes eachincoming request (e.g., by parsing a command specifier field oroperation code field to identify the request) and issues various signalsnecessary to carry out the requested operation. For example, upondecoding a row activation request having the bank address and rowaddress fields shown in request 207 of FIG. 4, the request decoder 301may output the row address and bank address onto the row bus 315 alongwith a control value that indicates an activation operation and then,depending on whether the activation request is directed to an even orodd bank (e.g., determined by inspection of the least significant bit(LSB) or other bit or bits of the bank address), assert an even-rowstrobe signal 321A (ERS) or odd-row strobe signal 321B (ORS) to load theaddress and control values from the row bus 315 into the even-bank rowcontrol registers 305A, 305B or odd-bank row control registers 307A,307B. Address and control values loaded into even-bank row controlregisters 305 are supplied to the even-bank row decoders for the group Aand group B sub-banks (i.e., row decoders 113 ₀ and 113 ₂ of FIG. 2) viasignal paths 111 ₀ and 111 ₂, respectively, and address and controlvalues loaded into odd-bank row control registers 307 are supplied tothe odd-bank row decoders for the group A and group B sub-banks viasignal paths 111 ₁ and 111 ₃. In one embodiment, the delivery of addressand control values via paths 111 initiates the indicated row operation(e.g., activation or precharge) within the corresponding row decoder sothat row operations are effectively initiated in response to assertionof the even-row strobe signal 321A and odd row strobe signal 321B (i.e.,when the corresponding registers 305 and 307 are updated). Inalternative embodiments, the even-row strobe signal 321A and odd-rowstrobe signal 321B (or other control signals derived therefrom orindependently generated) may be output to the row decoders to initiaterow operations therein.

In one embodiment, the request decoder 301 responds to incoming columnaccess requests in substantially the same manner as row activationrequests, except that bank address and column address values includedwith the requests are output onto the column address bus 317 togetherwith a control value that indicates, for example, whether a read orwrite operation is to be performed. Thereafter, the request decoder 301asserts either an even-column strobe signal 323A (ECS) or odd-columnstrobe signal 323B (OCS) to load the address and control values from thecolumn address bus 317 into either the even-bank column controlregisters 309A, 309B or odd-bank column control registers 311A, 311B,thereby initiating the specified column access operation in thecorresponding column decoder (i.e., the contents of the column controlregisters 309 and 311 are output to corresponding column decoders viapaths 109 to initiate column access operations therein). As with rowrequests, the request decoder 301 may inspect one or more bits of thebank address to determine whether a given column access request isdirected to an odd or even bank and assert either the even-column strobesignal 323A or odd-column strobe signal 323B accordingly. Alternatively,the request decoder 301 may associate incoming column access requestswith previously received row activation requests according to apredetermined protocol so that the bank address received in a rowactivation request is used to determine the set of banks, even or odd,to which a subsequently received column access request is directed. Insuch an embodiment, the LSB of the bank address (or other bit(s) used tospecify the target set of banks) may be omitted from the column accessrequest to enable other requests or information (e.g., prechargeinformation) to be conveyed therein. In either case, bank and columnaddress values (which may omit the LSB bit or other bit(s) of the bankaddress used to specify the target set of banks) loaded into even-bankcolumn control registers 309 are output to the even-bank column decodersfor the group A and group B sub-banks, respectively (i.e., columndecoders 103 ₀ and 103 ₂ of FIG. 7), and bank and column address valuesloaded into odd-bank column control registers 311 are output to theodd-bank column decoders.

FIG. 6 illustrates an exemplary timing of row and column strobe signalassertions by the request decoder 301 of FIG. 5 when the memory device100 of FIG. 2 is operating in a single-threaded mode. As shown, upondecoding a row activation request 207, the request decoder 301 assertseither an even-row strobe signal 321A (ERS) or an odd-row strobe signal3216 (ORS) to load either the even-row control registers 305A, 305B3 orthe odd row control registers 307A, 307B, respectively, with the bankaddress, row address and control information provided in the request207, thereby initiating a row activation operation in either the even orodd bank sets. Although assertion of even row strobe signal 321A isshown in FIG. 6, odd row strobe signal 3216 would be asserted toinitiate row activation in an odd bank set. The least significant bit ofthe bank address may be used to control which of the two strobe signals321A and 3216 is asserted, and therefore need not be loaded into theselected control register. After a t_(RR) interval has elapsed, anotherrow activation request directed to a different bank is received, andanother strobe signal 321A or 3216 is asserted to initiate acorresponding row activation operation.

Upon decoding a column access request 209 directed to the row activatedin response to row activation command 207 (i.e., specifying the samebank address as row activation command 207), the request decoder assertseither an even-column strobe signal 323A (ECS) or odd-column strobesignal 3236 (OCS) to load either the even-column control registers 309A,309B or odd-column control registers 311A, 311B, respectively, with thebank address, column address and control information provided in therequest 209, thereby initiating a column access operation (e.g., a reador write operation) within the open page for the specified bank.Although assertion of even column strobe signal 323A is shown in FIG. 6,odd column strobe signal 3236 would be asserted to initiate a columnaccess operation in a transaction directed to a bank in the odd bankset. Again, the least significant bit of the bank address may be used tocontrol which of the two strobe signals 323A, 323B is asserted toinitiate a column access operation. After a t_(CC) interval has elapsed,another column access request 211, directed to the same bank as request209 but different column address, is received within the request decoder301. Upon decoding the column access request 211, the request decoderasserts either the even-column strobe signal 323A or odd-column strobesignal 323B to load the corresponding pair of registers 309A/309B or311A/311B (i.e., the same pair of registers loaded in response todecoding request 209, as both requests are directed to the same openpage) and thereby initiate a second column access operation directed tothe row activated in response to request 207. Thus, a column strobesignal (323A or 323B) is asserted once per t_(CC) interval to enable thespecified data transfer to be carried out over the complete t_(CC)interval and using all the links of the DQ path. That is, when thememory device of FIG. 2 is operated in single-threaded mode, the datatransfer in response to a column access request consumes the entiret_(CC) envelope.

Reflecting on the operation of the request interface 300 of FIG. 5, itshould be noted that because both the even-row control registers 305A,305B are operated in lock step (i.e., 305A and 305B are loaded inresponse to the same strobe signal 321A), registers 305A and 305B may bereplaced by a single row control register which, when loaded, initiatesthe specified row operation (e.g., activation or precharge) in thesub-bank quadrants that form the even bank set (i.e., Q0, Q2).Similarly, the odd-row control register pair 307A/307B, even-columncontrol register pair 309A/309B and odd-column control register pair311A/311B may each be replaced by a respective single register. Further,if the request interface 300 did not include support for micro-threadedcolumn operations, the entire register set could be reduced to a singlerow control register and a single column control register, the controland address information loaded into each register being provided to therow and column decoders in all four quadrants of the memory device 100.In such an embodiment, the row and column decoders within each quadrantmay determine whether to initiate a row or column operation, forexample, based on the least significant bit of the bank address (i.e.,if BA[0] is a ‘0’, row/column operations are initiated in the decodersof the even quadrants, Q0/Q2, and if BA[0] is a ‘1’, row/columnoperations are initiated in the decoders of the odd quadrants, Q1/Q3).When the memory device 100 is operated in micro-threaded mode, however,the additional row control registers enable temporally overlapping rowoperations in different regions of memory device 100 and the additionalcolumn control registers enable temporally overlapping column operationsin different regions of memory device 100. In one embodiment, thedifferent regions in which overlapping operations are performed are theeven and odd bank sets of memory device 100. In another embodiment, thedifferent regions are the four quadrants of the memory device 100. Theseembodiments and others are discussed in further detail below.

Micro-Threaded Memory Transactions

FIGS. 7 and 8 illustrate an exemplary sequence of micro-threaded memorytransactions that may be performed in the memory device 100 of FIG. 2when operated in a micro-threaded mode at full signaling rate (e.g.,data path and request path signaling rates increased to 6.4 Gb/s and 1.6Gb/s, respectively; doubling the 3.2 Gb/s and 0.8 Gb/s legacy signalingrates described above). Rather than allocating the full t_(CC) envelopeto a single column access (i.e., as in the single-threaded modedescribed in reference to FIGS. 2-6) and redesigning the memory device100 to include double-width internal data path widths, the t_(CC)envelope is subdivided into sub-envelopes that are allocated toalternating transactions in the odd and even bank sets. That is,recognizing that the t_(RR) constraint applies to arbitrary bankselection and is imposed primarily to avoid conflicting use of resourcesshared by banks in the same bank set, (e.g., row decoders shared by evenbanks and row decoders shared by odd banks), it follows that rowactivation operations directed alternately to the odd and even bank setsmay be executed in sub-intervals within the overall t_(RR) interval,referred to herein as partial t_(RR) intervals, t_(RRp). Further,because distinct sets of column decoders and distinct data pathresources are provided for the even and odd bank sets, micro-threadedcolumn operations directed to the activated rows in alternate bank setsmay be executed one after another within a single t_(CC) intervalreferred to herein as partial t_(CC) intervals, t_(CCp). Through thisapproach, decoder and data path resources within the memory device 100that are used at approximately 50% duty in single-threaded mode (i.e.,using either the even-bank resources or the odd-bank resources for agiven row or column operation) are used concurrently in themicro-threaded mode to support micro-threaded column access operations.Because the data transferred in each micro-threaded column accessoperation consumes only a portion of a t_(CC) envelope, reduced columntransaction granularity is achieved relative to the granularity forsingle-threaded operation at the same signaling rate. Thus, datathroughput is effectively doubled without having to double the widths ofinternal data paths of the memory device, while at the same timereducing column transaction granularity.

Referring to the depiction of memory device 100 in FIG. 7 and the rowactivation pipeline 251A shown in FIG. 8, a first row activation request253 initiates activation of bank B0, row ‘z’ (i.e., “Rz” in FIG. 7, “ActB0-Rz” in FIG. 8). One t_(CC) interval later (i.e., four 1.25 ns requestintervals later at the exemplary 1.6 Gb/s request path signaling rateshown), a second activation request 255 initiates activation of bank B1,row ‘y’ (Act B1-Ry). Note that, while two row activation requests 253,255 are received within a single t_(RR) interval, the two requests aredirected alternately to even and odd banks (B0 and B1 in this example)and therefore do not conflict. A predetermined time after receipt of theB0-Rz activation request 253, a first micro-threaded column accessrequest 257 specifying a read at bank B0, column ‘a’ is received (i.e.,Rd B0-Ca), as shown in column access pipeline 251B. Similarly, apredetermined time after receipt of the B1-Ry activation request 255,and before the t_(CC) interval for the first micro-threaded columnaccess request 257 has elapsed, a second micro-threaded column accessrequest 259 specifying a read at bank B1, column ‘e’ is received (RdB1-Ce). Thus, two micro-threaded column access requests 257, 259 arereceived within a single t_(CC) interval and, because the requests aredirected alternately to odd and even bank sets, are serviced withoutconflict. More specifically, as shown at 275, read data retrieved fromcolumn ‘a’ of sub-banks B0-A and B0-B is delivered to the datainterfaces 105A and 105B for transmission on the DQA and DQB links,respectively, during the first half of the t_(CC) interval starting at271, and read data retrieved from column ‘e’ of sub-banks B1-A and B1-Bis delivered to the data interfaces 105A and 105B for transmission onthe DQA and DQB links during a second half of the same t_(CC) interval.By this arrangement, the 64-byte t_(CC) envelope is effectivelypartitioned between two micro-threaded column access operations eachhaving a 32-byte column transaction granularity, with the data for eachmicro-threaded column access operation being output onto the DQA and DQBlinks during a respective t_(CCp) interval as shown in expanded view272. Additionally, because the t_(CC) envelope is partitioned betweencolumn accesses directed to odd and even bank sets, the data transmittedduring either partial t_(CC) interval may be carried by the existingdata path interfaces 105A/105B. Accordingly, the internal data pathwidth of the memory device 100 need not be increased to accommodate theincreased data path bandwidth, avoiding the addedmanufacturing/operating costs and headroom issues discussed above.

Still referring to FIGS. 7 and 8, a t_(CC) interval after receipt of thefirst B0-directed micro-threaded column access request 257, a secondmicro-threaded column access request 261 directed to B0 and specifying aread at column ‘b’ is received (i.e., Rd B0-Cb). Similarly, a t_(CC)interval after receipt of the first B1-directed micro-threaded columnaccess request 259, a second B1-directed micro-threaded column accessrequest 263 specifying a read at column ‘f’ is received (i.e., RdB1-Cf). The second B0-directed and B1-directed micro-threaded columnaccess requests 261, 263 are serviced in the same manner as the firstB0-directed and B1-directed micro-threaded column access requests 257,259, resulting in transmission of B0, column ‘b’ data over the firsthalf of the t_(CC) interval (i.e., first t_(CCp) interval) thatimmediately succeeds the transmission of the B1, column ‘e’ data, andtransmission of B1, column ‘f’ data over the second half of the t_(CC)interval. Thus, four micro-threaded column operations are executed,resulting in four 32-byte data transfers over a single t_(RR) intervalstarting at 271. As each row activation yields two 32-byte column datatransfers, each within a respective t_(RRp) interval as shown at 274,the row transaction granularity is 64 bytes, half the 128-byte datatransfer capacity of the data path over the t_(RR) interval (i.e., halfthe 128-byte t_(RR) envelope).

Precharge requests 265 and 267 directed to banks B0 and B1 are receivedin the request interface 101 a predetermined time after receipt of themicro-threaded column access requests 261 and 263 directed to the samebanks (i.e., as shown in precharge pipeline 251C). The prechargerequests 265, 267 are serviced in the manner discussed above to closethe open pages in the specified banks.

FIG. 9 illustrates an exemplary timing of row and column strobe signalassertions by the request decoder 301 of FIG. 5 when in a micro-threadedmode. As shown, upon decoding a row activation request 253 directed toan even bank, the request decoder 301 asserts the even-row strobe signal321A to load even-row control registers 305A, 305B and thereby deliverbank and address values to the row decoders for the even banks. After apartial t_(RR) interval (t_(RRp)) has elapsed, a row activation request255 directed to an odd bank is received. Upon decoding the odd-bank rowactivation request 255, the request decoder 301 asserts the odd-rowstrobe signal 3216 to load odd-bank row control registers 307A, 307B andthereby deliver bank and row address values to the row decoders for theodd banks. Thus, assuming a fully loaded row activation pipeline, therequest decoder alternately asserts the even-row and odd-row strobesignals 321A, 3216 after each t_(RRp) interval to deliver bank and rowaddress values alternately to the row decoders for the odd and even banksets.

Still referring to FIG. 9, upon decoding a column access request 257(i.e., a micro-threaded column access request) directed to an even bank,the request decoder 301 asserts the even-column strobe signal 323A toload even-bank column control registers 309A, 309B and thereby deliverbank and column address values to the column decoders for the evenbanks. After a partial t_(CC) interval (t_(CCp)) has elapsed, a columnaccess request 259 directed to an odd bank is received. Upon decodingthe odd-bank column access request 259, the request decoder 301 assertsthe odd-column strobe signal 3236 to load odd-bank column controlregisters 311A, 3116 and thereby deliver bank and column address valuesto the column decoders for the odd banks. Thus, assuming a fully loadedcolumn access pipeline, the request decoder 301 alternately asserts theECS and OCS signals after each t_(CCp) interval to deliver bank andcolumn address values alternately to the column decoders for the odd andeven bank sets. As shown in FIG. 8, the alternating assertion of the ECSand OCS signals enables a time-staggered transfer of column data formultiple micro-threaded column access operations within a single t_(CC)interval. The ECS and OCS signals are asserted by the request decoder301 during a second t_(CC) interval in response to column accessrequests 261 and 263, respectively, thereby enabling the time staggereddata transfer to be repeated during a subsequent t_(CC) interval asshown in FIG. 8.

FIG. 10 illustrates exemplary link-staggered micro-threaded memorytransactions that may be performed in an alternative embodiment of thememory device 100 of FIG. 2. Row activation requests 253 and 255,micro-threaded column access requests 257, 259, 261 and 263, andprecharge requests 265 and 267 are received in the request interface 101and processed in generally the same manner as discussed above inreference to FIGS. 7-9. However, instead of subdividing the t_(CC)envelope temporally (i.e., time-staggering the column data output inresponse to micro-threaded column access requests received in the samet_(CC) interval), the t_(CC) envelope is subdivided spatially throughconcurrent data transfer of the column data for same-t_(CC)-intervalmicro-threaded column access requests on different portions of the DQAand DQB data paths as shown at 375; an operation referred herein to aslink staggering. That is, the column data transferred in response tomicro-threaded column access request 257 is transmitted via a firstsubset of the DQA links and a first subset of the DQB links (e.g.,DQA[3:0] and DQB[3:0], while column data transferred in response tomicro-threaded column access request 259 is concurrently (i.e., partlyor completely overlapping in time) transmitted via a second subset ofthe DQA lines and a second subset of the DQB links (e.g., DQA[7:4] andDQB[7:4]). Similarly, the column data transferred in response tomicro-threaded column access requests 261 and 263 is transmitted duringa subsequent t_(CC) interval over the first and second subsets of DQlinks, respectively. The t_(CC) envelope remains at 64 bytes, with thecolumn transaction granularity and row transaction granularity being 32bytes and 64 bytes, respectively, as in the temporally-staggeredembodiment of FIG. 8. Thus, the link-staggered approach provideseffectively the same benefits as the temporally-staggered approach interms of reduced row and column transaction granularity, but does so byallocating respective portions of the DQ paths to service themicro-threaded column access requests, instead of respective portions ofthe t_(CC) interval.

FIGS. 12A and 12B illustrate exemplary data path interfaces 401 and 411that may be used to support the time-staggered and link staggered datatransfers shown in FIGS. 8 and 10, respectively. The data path interface401 of FIG. 12A corresponds generally to the data path interface 105Adescribed in reference to FIG. 3 and includes DQA pads 171 and a pair oftransceivers 403 and 405 coupled between the pads 171 and respectivecolumn data paths 119 ₀ and 119 ₁. Each transceiver 403, 405 includes adata serializer 177 and transmitter 175 to generate an output datastream, and a receiver 179 and data deserializer 181 to receive anincoming data stream. More specifically, in the exemplary embodimentshown, the data serializer 177 converts 128-bit column data valuesreceived via the column data path 119 into a sequence of sixteen 8-bitvalues (e.g., picking off one byte at a time in round-robin fashion fromeach of sixteen different offsets within the 128-bit column data) whichare delivered to the transmitter 175 for transmission in respectivetransmission intervals via the DQA pads 171. Conversely, a sequence ofsixteen 8-bit values recovered by the receiver 179 are delivered to thedata deserializer 181 which gathers the values into a 128-bit columndata value that is provided to the column decoder via column data path119. In the time-staggered data transfer operation illustrated in FIG.8, the data transfer path through transceiver 403 is used during a firstpartial t_(CC) interval, and the data transfer path through transceiver405 is used during the second partial t_(CC) interval, as indicated byarrows 408A and 408B which are shaded to correspond to the column accesstransactions of FIG. 8.

In the data path interface 411 of FIG. 12B, the eight links of the DQApath are subdivided into two groups of four links, DQA[3:0] andDQA[7:4], and used to transfer data for respective micro-threaded columntransactions. Accordingly, a first pair of transceivers 413A/415A iscoupled to a first set of four DQA pads 171A and a second pair oftransceivers 413B/415B is coupled to a second set of four DQA pads 171B,with each individual transceiver including an output data path formed bya data serializer 427 and transmitter 425, and an input data path formedby a receiver 429 and data deserializer 431. As each 128-bit column datavalue is transferred over half the number of signal links (e.g., fourinstead of eight), the data serializer is a 1:32 data serializer (i.e.,converting the 128-bit column data value into a sequence of thirty-two4-bit data values) instead of the 1:16 data serializer of FIG. 12A, andthe data deserializer is a 32:1 data deserializer (gathering a sequenceof thirty-two 4-bit values into a 128-bit column data value) instead ofthe 16:1 data deserializer 181 of FIG. 12A. Thus, each transceiver 413A,415A, 413B, 415B transfers a 128-bit column data between a column datapath 119 and a smaller number of DQ links, but over twice the interval(i.e., over a full t_(CC) interval rather than a half t_(CC) interval).

As in the embodiment of FIG. 12A, the micro-threaded column transactionsserviced in the same t_(CC) interval are directed alternately to odd andeven banks so that one set of four data path links is fed by data from(or feeds data to) one of the column data paths 119 ₀ and 119 ₁ over agiven t_(CC) interval, and the other set of four data path links isconcurrently fed by data from (or feeds data to) the other of the columndata paths 119 ₀ and 119 ₁ during the t_(CC) interval. This data flowarrangement is shown, for example by arrows 418A and 418B which areshaded to correspond to the column access transactions of FIG. 10.

Comparing FIGS. 12A and 12B, it can be seen that the data pathinterfaces 401 and 411 differ primarily in the operation of the dataserializer and data deserializer circuits. That is, the two four-linktransmitters 425 in transceivers 413A and 413B may be implemented andconnected to the eight DQA pads (i.e., 171A and 171B) so as to beequivalent to the single eight-link transmitter 175 in transceiver 403,and the two four-link receivers 429 in transceivers 413A and 413B maylikewise be equivalent to the eight-link receiver 179 in transceiver403. The 1:16 data serializers 177 in transceivers 403 and 405 differfrom the 1:32 data serializers 427 in transceivers 413A, 413B, 415A and415B primarily by the manner in which incoming 128-bit column data isdistributed to the DQA data pads; data deserializer 177 delivering 8-bitchunks of the 128-bit column data to eight DQA pads over 16 transmissionintervals and, data deserializer 427 delivering 4-bit chunks of the128-bit column data to four DQA pads over 32 transmission intervals. The16:1 data deserializers and 32:1 data deserializers are similarlydifferent in the manner of data distribution from DQA pads to columndata values. Thus, in one embodiment, the data path interfaces 401 and411 are implemented by a single data path interface circuit having dataserializer and data deserializer circuits that are configurable (e.g.,through mode register programming) to support either link-staggered ortime-staggered data transfer in response to micro-threaded column accessrequests.

Referring to FIGS. 10 and 12B, it should be noted that, because columndata retrieved in response to micro-threaded column access request 257will become available for transfer before column data retrieved inresponse to micro-threaded column access request 259, thefirst-retrieved column data may be buffered (e.g., within the data pathinterface 411) until the second-retrieved column data is available,thereby enabling simultaneous (i.e., fully concurrent) transfer of thecolumn data for the two access requests. Alternatively, as shown in FIG.11, the first-retrieved column data (i.e., retrieved in response torequest 257) may be output as soon as it becomes available, thusresulting in a time-staggering of the same-row transmissions on theupper and lower partitions of the data path as shown at 385, withtransfer of the second-retrieved column data being delayed relative totransfer of the first-retrieved column data by a partial t_(CC) interval(i.e., t_(CCp)). The transmissions over the signal path partitions arethus partially overlapped (but still concurrent over a partial t_(CC)interval). Such an approach may be desirable in some applications, as nobuffering of column data is required and a single, deterministic memoryaccess latency applies to each micro-threaded column access. Bycontrast, if column data retrieved in response to a first micro-threadedcolumn access request is buffered to enable simultaneous, link-staggeredtransfer with column data retrieved in response to a secondmicro-threaded column access request, the two micro-threaded columnaccess requests may have different, though still deterministic, memoryaccess latencies.

FIGS. 13 and 14 illustrate an exemplary sequence of micro-threadedmemory transactions that may be performed in the memory device 100 ofFIG. 2 when operated in an alternative micro-threaded mode at fullsignaling rate (e.g., data and request bandwidth doubled over the legacy3.2 Gb/s and 0.8 Gb/s signaling rates described above). In thealternative micro-threaded mode, referred to herein as a sub-bankmicro-threaded mode, the number of micro-threaded column transactionsper t_(CC) envelope is doubled relative to the micro-threaded mode ofFIGS. 7 and 8 by increasing the number of column addresses provided ineach column access request, and by applying the pairs of columnaddresses delivered in each column access request to different sub-banksof the same bank. Referring to the row activation pipeline 461A, columnaccess pipeline 4616 and precharge pipeline 461C shown in FIG. 14, forexample, a sequence of row activation requests 253, 255 is received asin FIG. 8 (i.e., with the pair of requests received within each t_(RR)interval being directed alternately to even and odd banks), and asequence of micro-threaded column access requests 467, 469, 471, 473 arealso received as in FIG. 8, but with each access request including abank address and two column addresses as shown at 480. By applying thetwo column addresses against alternating sub-banks of a specified evenor odd bank (i.e., sub-bank micro-threading), two distinct 16-bytecolumn data values may be retrieved per micro-threaded column access,thus achieving 16-byte column transaction granularity. In the particularexample of FIGS. 13 and 14, for instance, a first micro-threaded columnaccess request 467 specifies a read at column ‘a’ of sub-bank B0-A and aread at column ‘c’ of sub-bank B0-B (i.e., Rd B0-Ca/Cc), while a secondmicro-threaded column access 469 received within the same t_(CC)interval specifies read at column ‘e’ of sub-bank B1-A and a read atcolumn ‘g’ of sub-bank B1-B (Rd B1-Ce/Cg). In a subsequent t_(CC)interval, two additional micro-threaded column access requests 471 and473 are received, the first specifying a read at columns ‘b’ and ‘d’ ofthe B0-A and B0-B sub-banks (Rd B0-Cb/Cd), and the second specifying aread at columns ‘f’ and ‘h’ of the B1-A and B1-B sub-banks (RdB1-Cf/Ch). As shown at 275, the t_(CC) envelope is temporally subdividedbetween successive t_(CCp) intervals and spatially subdivided betweenDQA and DQB data path links to accommodate the four column datatransfers that correspond to the four column addresses received in eachpair of micro-threaded column access requests. Thus, through sub-bankmicro-threading, the column transaction granularity is reduced to 16bytes without reduction in the aggregate amount of transferred data(i.e., peak bandwidth of the memory device is maintained). The rowtransaction granularity remains at 64 bytes as four 16-byte column datatransfers are performed per activated row.

FIG. 15 illustrates an embodiment of a request interface 500 that may beused within the memory device 100 of FIG. 2 (i.e., to implement requestinterface 101) to enable the micro-threaded memory transactionsdescribed in reference to FIGS. 13 and 14. As in the embodiment of FIG.5, the request interface 500 includes a request decoder 501 to processan incoming stream of requests (i.e., received via pads 303 anddeserialized, if necessary, by optional request deserializer 304),even-bank row control registers 305A, 305B3, odd-bank row controlregisters 307A, 307B, even-bank column control registers 309A, 309B andodd-bank column control registers 311A, 311B. The request interface 500also includes a row bus 315 coupled between the request decoder 501 andthe even- and odd-bank row control registers 305, 307, a first columnbus 503A coupled between the request decoder 501 and the even-bankcolumn control registers 309 and a second column bus 503B coupledbetween the request decoder 501 and the odd-bank column controlregisters 311.

Upon decoding a row activation request, the request decoder 501 outputsthe row address and bank address onto the row bus 315 and, as discussedin reference to FIG. 5, asserts either an even-row strobe signal 321A orodd-row strobe signal 3216 (i.e., depending on whether the request isdirected to an even bank or odd bank) to load the bank and row addressvalues into either the even-bank row control registers 305 or odd-bankrow control registers 307. Upon decoding a micro-threaded column accessrequest having a bank address and two column addresses (i.e., a sub-bankmicro-threaded column access), the request decoder 501 outputs the firstand second column addresses on the first and second column buses 503Aand 503B, respectively, then asserts either an even-column strobe signal323A or odd-column strobe signal 323B (i.e., depending on whether therequest is directed to an even bank or odd bank) to load the firstcolumn address into either the even-bank or odd-bank column controlregister (305A or 307A) for the group A sub-banks (i.e., referring toFIG. 13, the sub-banks in quadrants Q0 and Q1), and to load the secondcolumn address into the corresponding even-bank or odd-bank columncontrol register (305B or 307B) for the group B sub-banks (i.e.,sub-banks in quadrants Q2 an Q3). In one embodiment, the bank addressvalue received in the sub-banked micro-threaded column access request isoutput onto both the first and second column buses 503A and 503B andloaded, along with the first and second column addresses, into eitherthe even-bank or odd-bank column control registers 309 or 311 inresponse to assertion of the even-column load or odd-column strobesignals 323A, 323B. In an alternative embodiment, a separate bankaddress bus may be provided and coupled in common to the even andodd-bank column control registers 309 and 311 (and to the requestdecoder 501) to enable a bank address supplied thereon to be loaded intobank address fields within those registers (or into separate bankaddress registers). Also, in another embodiment, a single column bus iscoupled to all the column control registers 309, 311 andtime-multiplexed to load the first column address into one of columncontrol registers 309A and 311A, and the second column address into oneof column control registers 309B and 309B in respective address transferoperations. In such an embodiment, distinct strobe signals may beprovided to each of the four column control registers 309A, 309B, 311A,311B to enable one column control register to be loaded at a time. Also,in all such embodiments, the two column addresses may be independentlyspecified by the incoming micro-threaded column access request, or maybe specified in relation to one another. For example, one of the twoaddresses may be specified as an arithmetic or logical offset from theother. The offset value may be specified in the column access request,or the column access request may include a value that is used indirectlyto determine the offset value, for example, by indexing a lookup tableof offset values.

In the embodiment of FIG. 15, the request decoder 501 may assert theregister strobe signals 321A, 321B, 323A and 323B at the times shown inFIG. 9, thereby enabling the two column addresses received in eachsub-bank micro-threaded column access request to be applied concurrentlywithin each t_(CCp) interval (i.e., simultaneously or at least partlyoverlapping in time) to retrieve respective sets of data from differentcolumns of the same bank. In an embodiment having a shared,time-multiplexed column bus, four column strobe signals (e.g., ECS1,ECS2, OCS1, OCS2) may be asserted in succession to enable two columncontrol register load operations per t_(CCp) interval. For examplesignals ECS1 and ECS2 may be asserted one after another during a firstt_(CCp) interval, and OCS1 and OCS2 asserted one after another during asecond t_(CCp) interval.

FIGS. 16 and 17 illustrate exemplary micro-threaded memory operations inwhich separate row and column addresses are used to access sub-banks ineach of the four quadrants, Q₀-Q₃, of a memory device 530 within asingle t_(CC) interval. More precisely, because the request interface531 of memory device 530 delivers unique bank, row and column addressesto the address decoders (i.e., column decoders 103 and row decoders 113)for each quadrant, the storage arrays in each quadrant are effectivelyconverted from sub-banks to banks, thereby yielding a sixteen bankarchitecture having banks B0-B15 as shown in FIG. 16. As in the othermemory device embodiments discussed above, memory device 530 may havemore or fewer storage arrays in alternative embodiments, yieldingcorrespondingly more or fewer banks. Also, any of the sixteen banks mayinclude two or more constituent sub-banks.

Referring to FIGS. 16 and 17, the row activation pipeline 551A is moredensely loaded than in previously described embodiments to deliver rowactivation requests 553, 555, 557 and 559 directed to banks within eachof the four quadrants of the memory device 530 in a single t_(RR)interval. That is, the t_(RR) interval is sub-divided into four t_(RRp)intervals, with a row activation request being received in each. In thespecific example shown in FIGS. 16 and 17, rows ‘w’, ‘x’, ‘y’ and ‘z’are activated one after another in banks B0, B8, B9 and B1, respectively(i.e., Act B0-Rw, Act B8-Rx, Act B9-Ry and Act B1-Rz), though rows maybe activated in each of the four quadrants in different order insubsequent transactions or in alternative embodiments.

Referring to column access pipeline 551B, a predetermined time after thefirst row activation request 553 is received, a pair of dual-addressmicro-threaded column access requests 561, 563 are received over a firstt_(CC) interval 550, thereby delivering four bank addresses and fourcolumn addresses that may be applied against pages opened in the fourquadrants in response to activation requests 553, 555, 557 and 559. Inthe particular example shown, the first dual-address micro-threadedcolumn access request 561 includes a first pair of addresses, Bank Addr1and Col Addr1 as shown at 577, that specify an access at column ‘a’ ofbank B0 (i.e., in the open page thereof), and a second pair ofaddresses, Bank Addr2 and Col Addr2, that specify an access at column‘b’ of bank B8 (i.e., Rd B0-Ca/B8-Cb). The second dual-addressmicro-threaded column access request 563 specifies an access at column‘c’ of bank B1 and column ‘d’ of bank B9 (Rd B1-Cc/B9-Cd). Because eachof the four bank addresses provided in the pair of access requests 561and 563 and specifies a bank in a different quadrant of the memorydevice 530, no column decoder or data path conflict arises in servicingthe requests. Accordingly, a predetermined time after receipt of thefour address values in the pair of access requests 561 and 563, fourcorresponding sets of column data are transmitted via the data path overa t_(CC) interval that begins at 577. As shown at 575, the four sets ofcolumn data are transmitted in both link-staggered manner (i.e., twosets of data transferred on the DQA links and two sets transferred onthe DQB links) and time-staggered manner (the column data correspondingto the first and access requests 561 and 563 being transferred in firstand second t_(CCp) intervals, respectively) within the 64-byte t_(CC)envelope, with each set of column data having a 16 byte columntransaction granularity.

Still referring to FIGS. 16 and 17, a second pair of dual-addressmicro-threaded column access requests 565 and 567 are received in thet_(CC) interval that immediately follows interval 550. As in the firstpair of access requests 561 and 563, the four bank addresses carriedwithin the second pair of access requests specify accesses at selectedcolumns of the previously activated rows of banks in each of the fourquadrants of the memory device 530 (e.g., Rd B0-Ce/B8-Cf and RdB1-Cg/B9-Ch). Accordingly, a predetermined time after receipt of thefour address values in the second pair of micro-threaded column accessrequests 565 and 567, four corresponding sets of column data aretransmitted via the data path. The four sets of column data aretransmitted link-staggered and time-staggered within the 64-byte t_(CC)envelope, with each set of column data having a 16 byte columntransaction granularity. The t_(RR) envelope remains at 128 bytes (i.e.,twice the t_(CC) envelope as t_(RR)=2t_(CC) in this example), but, dueto the four-way partitioning of the t_(RR) interval, the row transactiongranularity is reduced to 32 bytes.

Still referring to FIG. 17, the increased request density in rowactivation pipeline 551A may eliminate request intervals otherwise usedto issue precharge requests. In one embodiment, precharge requests thatwould otherwise be transmitted during the intervals shown at 581 and 583of a precharge request pipeline 551C, are instead handled by a sub-fieldwithin column access requests. For example, as shown in at 577 and 579,a precharge bit (or bits) may be included with each column accessrequest to indicate whether a precharge operation is to be automaticallyperformed upon conclusion of the requested access. Thus, in accessrequest 561, the precharge bit is reset (e.g., “Prechg=0” as shown at577) to defer the precharge operation, leaving the pages of thespecified banks (B0 and B8) open for one or more subsequent columnaccesses. In access request 567, the precharge bit is set (e.g.,“Prechg=1” as shown at 579), thereby instructing the memory device 530to perform precharge operations in the specified banks (B1 and B9) atthe conclusion of the specified column accesses.

FIG. 18 illustrates an embodiment of a request interface 600 that may beused to implement request interface 531 within the memory device 530 ofFIG. 16 and to support the four-way micro-threaded transactionsdescribed in reference to FIG. 17. As in the embodiment of FIGS. 5 and15, the request interface 600 includes a request decoder 601 to processan incoming stream of requests (i.e., received via pads 303 anddeserialized, if necessary, by optional data deserializer 304),even-bank row control registers 305A, 3056, odd-bank row controlregisters 307A, 3076, even-bank column control registers 309A, 309B andodd-bank column control registers 311A, 311B. The request interface 600also includes a row bus 315 coupled between the request decoder 601 andthe even-bank and odd-bank row control registers 305, 307 and, as in theembodiment of FIG. 15, first and second column buses 503A and 503Bcoupled to the even-bank column control registers 109 and odd-bankcolumn control registers 11, respectively.

Upon decoding a row activation request, the request decoder 601 outputsthe row address and bank address onto the row bus 315, then asserts oneof four row-register strobe signals 605A, 605B, 607A or 607B (ERSA,ERSB, ORSA, ORSB) according to the quadrant specified in the least twosignificant bits (or other bits) of the bank address; an address fieldreferred to herein as the quadrant address. Assuming, for example, thatincoming stream of activation requests is directed in round-robinfashion to quadrants Q0, Q2, Q1 and Q4 of the FIG. 16 memory device 530,then the row-register strobe signals 605A, 605B, 607A and 607B areasserted one after another in respective t_(RRp) intervals. Otherquadrant address sequences may be used in alternative embodiments,resulting in a different sequence of row-register strobe signalassertions.

Upon decoding a dual-address micro-threaded column access (e.g., request561 of FIG. 17), the request decoder 601 outputs the first bank addressand column address values therein onto the first column bus 503A and thesecond bank address and column address values therein (BA2, CA2) ontothe second column bus 5036, then asserts either an even-column strobesignal 323A (ECS) or odd-column strobe signal 3236 (OCS) according towhether the pair of addressed banks are odd or even. In alternativeembodiments, to avoid restrictions on the pair of banks addressed in agiven multi-threaded column access request (e.g., enabling a columnaccess directed to an odd bank to be paired with a column accessdirected to an even bank), separate column strobe signals and columnaddress buses may be provided to each of the column control registers309A, 309B, 311A and 311B, with any pair of the column strobe signalsasserted to enable the corresponding column control registers to besimultaneously loaded. Also, as discussed above in reference to FIG. 15,a single time-multiplexed column bus may be coupled to all the columncontrol registers 309, 311 to enable sequential loading of selectedcolumn control registers in any order.

FIG. 19 illustrates an exemplary timing of control signal assertions bythe request decoder 601 of FIG. 18. In the particular example shown, rowactivation requests 553, 555, 557 and 559 are received in respectivet_(RRp) intervals, with each request being directed to a differentquadrant in the exemplary order described in reference to sequence ofFIGS. 16 and 17 (other row activation orders may be used). Thus, therequest decoder 601 asserts the four row-register strobe signals ERSA,ERSB, ORSA, ORSB in respective t_(RRp) intervals, as shown, to transferthe bank and row address values received in the four activation requeststo the address-specified row registers 305, 307. Assuming that the samepattern of row activation requests is received in subsequent t_(RR)intervals (i.e., same quadrant ordering, but arbitrary intra-quadrantbank address and row address), each of the four row-register strobesignals ERSA, ERSB, ORSA, ORSB may be asserted once per t_(RR) intervalin round-robin fashion.

Still referring to FIG. 19, column strobe signals ECS and OCS areasserted on partial t_(CC) intervals (i.e., t_(CCp)) as in theembodiment of FIG. 9. As discussed, if the request decoder supports anarbitrary column control register loading sequence, four distinct columnstrobe signals may be generated by the request decoder and asserted inrespective t_(CCp) intervals in any order. If the incoming column accessrequests specify the same quadrant access order in each t_(RR) cycle (orany group of t_(RR) cycles), each of the column strobe signals may beasserted in round-robin fashion once per t_(RR) interval or, in the caseof the shared column strobe signals (ECS and OCS) shown in FIGS. 18 and19, once per t_(CC) interval.

FIGS. 20A and 20B illustrate exemplary row requests that may be issuedto the memory devices 100 and 530 described above to initiate rowoperations (e.g., row activation operations and precharge operations).More specifically, FIG. 20A illustrates an exemplary ST-mode(single-threaded mode) row request that is issued when the memorydevices 100, 530 are operated in a single-threaded mode, and FIG. 20Billustrates an exemplary MT-mode (micro-threaded-mode) row requestissued when the memory devices are operated in a micro-threaded mode. Inthe particular embodiment shown, each row request is issued in twosuccessive transfers (e.g., during odd and even phases of a clock signalor other timing signal) over a 12 bit request path (RQ0-RQ11) andtherefore includes 24 bits. As shown, the ST-mode request includes athree-bit opcode formed by bits “OP,” a three-bit bank address, BA0-BA2,and an eleven-bit row address, R0-R10. The opcode indicates the type ofrow operation to be performed (e.g., row activation or precharge), thebank address indicates which of the eight banks the row operation isdirected to, and the row address indicates, at least in the case of arow activation operation, the row of the selected bank in which theoperation is to be performed. The remaining seven bits of the ST-moderow request may be reserved (i.e., as indicated by the designation“rsrv”) or used to carry information for controlling other functionswithin the memory device. The MT-mode row request of FIG. 20B issubstantially the same as the ST-mode row request, except that one ofthe reserved bits in the ST-mode request (e.g., the even phase bittransferred on request link RQ3) is optionally used to carry anadditional bank address bit, BA3, thereby enabling selection of one ofone of sixteen banks within the 16-bank memory device 530 of FIG. 16. Inalternative embodiments, row requests of FIGS. 20A and 20B may havedifferent formats, different numbers of bits and may be transmitted inmore or fewer transfers over a wider or narrower request path.

FIGS. 21A and 21B illustrate exemplary column requests that may beissued to the memory devices 100 and 530 described above to initiatecolumn access operations (e.g., read operations and write operations).More specifically, FIG. 21A, illustrates an exemplary ST-mode(single-threaded mode) column request that is issued when the memorydevices 100, 530 are operated in a single-threaded mode, and FIG. 21Billustrates an exemplary MT-mode (micro-threaded mode) column requestissued when the memory devices are operated in a micro-threaded mode. Inthe embodiments shown, the ST-mode and MT-mode column requests are thesame size as the corresponding ST-mode and MT-mode row requests (i.e.,24-bit requests formed by odd and even phase transfers over the 12 bitrequest path, RQ0-RQ11), but may be larger or smaller than the rowrequests in alternative embodiments. The ST-mode column request includesa five-bit opcode to specify the type of column access to be performed(e.g., read, write, masked write, etc.), a three-bit bank address,BC0-BC2, to specify one of eight open pages to be accessed (i.e., anopen page for one of the eight banks), and a 6-bit column address,C4-C9, to specify one of 64 column locations (also called columnoffsets) within the open page at which the specified column accessoperation is to be performed. Ten bits of the ST-mode column accessrequest are reserved or allocated to other functions.

The MT-mode column request is similar to the ST-mode column accessrequest except that the reserved bits of the ST-mode column accessrequest are used to carry a second bank address BCy0-BCy2 and a secondcolumn address, Cy4-Cy9, the first bank address and first column addressbeing carried in the same bits as the bank address and column address ofthe ST-mode column request, but designated BCx0-BCx2 and Cx4-Cx9. Bythis arrangement, each column request may carry the two distinct bankand column addresses used in the sixteen-bank memory device described inreference to FIGS. 16-19. In alternative embodiments, a second columnaddress, but not a second bank address may be provided in the MT-modecolumn request (e.g., as in the embodiments described in reference toFIGS. 13-15) and in other alternative embodiments, a single columnaddress and single bank address are provided per column access request(e.g., as in the embodiments described in reference to FIGS. 7-9). Also,in the embodiment described in reference to FIG. 17, the higher densityof row activation commands in the row activation pipeline consumes therequest path bandwidth that might otherwise be used to transferprecharge commands. Accordingly, in the exemplary MT-mode column requestof FIG. 21A, one bit (i.e., odd phase bit transferred over the RQ11link) is used to indicate whether an auto-precharge operation (AP) is tobe performed at the conclusion of the indicated column access operation.In alternative embodiments, the ST-mode and/or MT-mode column requestsmay have different formats, different numbers of bits and may betransmitted in more or fewer transfers over a wider or narrower requestpath.

FIGS. 22 and 23 illustrate exemplary micro-threaded memory operations ina memory device 700 having a request interface 701 and data pathinterface 705A, 705B to interface with legacy request and data paths.Referring to FIG. 23, the request interface receives row and columnrequests (including address components thereof) via a 19-bit requestpath 730 formed by a reset line (RESET), chip-select line (CS),row-address-strobe line (RAS), column-address-strobe line (CAS),write-enable line (WE), three bank address lines (BA[2:0]) and elevenaddress lines (A[10:0]). The data path interface is coupled to anexternal data path 732 formed by 32 data lines (DQ), four data-masklines (DM), four read data strobe lines (RDQS), and four write datastrobe lines (WDQS). The data-mask lines are used to carry respectivemask bits during masked-write operations, with each mask bit indicatingwhether a corresponding byte carried on the DQ lines is to be written ornot. The read data strobe lines carry read data strobe signals outputfrom the memory device to time reception of corresponding read data inthe memory controller or other control device. The write data strobelines carry write data strobe signals output from the memory controller(or other control device) to time reception of write data within thememory device 700. Each of the signal lines in the request path 730and/or data path 732 may be single-ended or differential. Also, inalternative embodiments, different numbers and types of signals may beconducted via the request path 730 and/or data path 732.

In the embodiment of FIG. 22, memory device 700 has substantially thesame architecture as memory device 100 of FIG. 1. That is, the memorydevice 700 has four quadrants, Q0-Q3, eight banks B0-B7 (each formed bya pair of A and B sub-banks), together with column decoders 703 ₀-703 ₃and row decoders 713 ₀-713 ₃ that correspond to the column decoders 103₀-103 ₃ and row decoders 113 ₀-113 ₃ of FIG. 1, though the banks mayhave different width and/or depth dimensions and the column and rowdecoders correspondingly revised to accommodate thedifferently-dimensioned banks. Also, signal paths 709 ₀-709 ₃, 711 ₀-711₃, 715 ₀-715 ₃, 717 ₀-717 ₃ and 719 ₀-719 ₃ correspond to the signalpaths 109 ₀-109 ₃, 111 ₀-111 ₃, 115 ₀-115 ₃, 117 ₀-117 ₃ and 119 ₀-119 ₃of FIG. 1, though such signal paths may include different numbers ofsignal lines as necessary to accommodate the different bank dimensions.Further, as with the memory device 100, memory device 700 is assumed forpurpose of description to be a DRAM device, but may be any type ofmemory device having multiple storage arrays that share addressingand/or data path resources in a manner that imposes timing constraintson sequential accesses directed to the different storage arrays. Also,the memory device 700 may have a different number of banks, sub-banksper bank and/or number of sub-banks per decoder-sharing group inalternative embodiments.

Turning to FIG. 23, a row activation pipeline 731A, column accesspipeline 731B, and precharge pipeline 731C illustrate an exemplarysequence of row activation requests, column access requests andprecharge requests received in the request interface 701 via requestpath 730. Referring first to row activation pipeline 731A, a pair of rowactivation requests directed alternately to even and odd banks of thememory device 700, are received in each t_(RR) interval, starting withrow activation requests 733 and 735. The request interface 701 respondsto each pair of row activation requests by initiating row activationoperations in the corresponding banks of the memory device 700.

A predetermined time after receipt of the row activation requests 733and 735, a sequence of four multi-address, micro-threaded column accessrequests 737, 739, 741 and 743 are received, each pair of the columnaccess requests being received in a respective t_(CC) interval and eachindividual column access request received within a given t_(CC) intervalbeing directed the open page for the bank specified in a respective oneof row activation requests 733 and 755. Referring to FIGS. 22 and 23,for example, column access request 737 is directed to the open page ofbank B0 (opened in response to row activation request 733) and specifiesaccesses therein at column addresses ‘a’ and ‘c.’ Column access request739, received in the same t_(CC) interval as column access request 737,is directed to the open page of bank B1 (opened in response to rowactivation request 735) and specifies accesses therein at columnaddresses ‘e’ and ‘g.’ Column access requests 741 and 743 are receivedin a second t_(CC) interval, with column access request 741 directed tocolumns ‘b’ and ‘d’ of the open page of bank B0, and column accessrequest 743 directed to columns ‘f’ and ‘h’ of the open page of bank B1.The open pages are closed in precharge operations requested in prechargerequests 745 and 747. In the particular embodiment shown, each requestinterval is 1 ns (i.e., requests are transferred over the individualsignal lines of path 730 at 1 Gb/s), so that an 8 ns t_(RR) constraintand a 4 ns t_(CC) constraint are assumed. Also, the t_(Rc) constraint isassumed to be 40 ns, so that activation requests directed to the rowsspecified in requests 733 and 735 are not issued again until after a 40nS interval has elapsed. Other request transfer rates may be used anddifferent t_(RR), t_(CC) and/or t_(Rc) constraints may apply inalternative embodiments.

The request decoder responds to the incoming column access requests 737,739, 741 and 743 by issuing signals to the appropriate column decodersto perform the access operations (e.g., read or write operations), withdata that corresponds to each column access being transferred via DQlinks DQ[31:0] over a respective partial t_(CC) interval (t_(CCp)). Morespecifically, as shown in detail view 738, each t_(CC) envelope isspatially and temporally subdivided so that, over the t_(CCp) intervalstarting at 736, data that corresponds to column ‘a’ of column accessrequest 737 (i.e., data being written to column ‘a’ or read from column‘a’) is transferred via a first portion of the DQ links, DQ[31:16], anddata that corresponds to column ‘c’ of column access request 737 istransferred via a second portion of the DQ links, DQ[15:0]. Similarly,during the next t_(CCp) interval, data that corresponds to columns ‘e’and ‘g’ of column access request 739 is transferred via DQ links,DQ[31:16] and DQ[15:0], respectively. Thus, over the t_(CC) intervalstarting at 736, data transfers that correspond to four differentmicro-threaded column access transactions are carried out. In theexemplary embodiment shown, data is transferred over each of the DQlinks at 2 Gb/s, so that four bits per link are transferred over each 2ns t_(CCp) interval. Consequently, an 8-byte column transactiongranularity is achieved in a device otherwise having a 32-byte t_(CC)envelope. During the t_(CC) interval that follows the Ce/Cg datatransfer, four additional data transfers are carried out in response tothe micro-threaded column access requests specified in requests 741 and743. That is, during a first t_(CCp) interval, data that corresponds tocolumns ‘b’ and ‘d’ of column access request 741 is transferred via DQlinks DQ[31:16] and DQ[15:0], respectively, and during the next t_(CCp)interval, data that corresponds to columns ‘f’ and ‘h’ of column accessrequest 743 is transferred via DQ links DQ[31:16] and DQ[15:0],respectively. Thus, the total amount of data transferred over the t_(RR)interval starting at 736 is 64 bytes, with one half of the total t_(RR)envelope being allocated to data transfer for each of the rows activatedin response to row activation requests 733 and 735. That is, the 64-bytet_(RR) envelope is temporally subdivided between the rows activated inresponse to requests 733 and 735 to achieve a 32-byte row transactiongranularity.

Depending on the number of bits required to specify a column addresswithin the memory device of FIG. 23, the 19-bit request size (i.e.,established by the width of request path 730) may be insufficient tocarry two complete column addresses. In one embodiment, thiscircumstance is overcome by storing a set of offset values within thememory device 700 and including an offset select value within incomingmulti-address column access requests to select one of the pre-storedoffset values. The selected offset value may then be used directly asthe second column address or may be combined with a fully specifiedcolumn address to form a relative column address. For example, in theexemplary format shown at 740, column access request 737 includes anoperation specifier, “Col Cmd,” that specifies the type of column access(e.g., read, write, masked write, etc.); a bank address, “Bank Addr,”that specifies the bank to which the column access is directed; afully-specified column address, “Col Addr1,” that specifies a firstcolumn address (e.g., column ‘a’ in request 737); and an offset selectvalue, “OSEL,” that specifies a pre-stored offset value to be summed (orotherwise arithmetically or logically combined) with the fully-specifiedcolumn address to produce the second column address. That is, as shownat 742, the offset select value may be applied to the control input of amultiplexer 744 to select one of n offset values, Coff0-Coff(n−1), to besummed with column address, Ca, in adder 746, thereby producing thesecond column address, Cc.

FIG. 24 illustrates a more detailed example of address informationprovided, via lines BA[2:0] and A[10:0] of request path 730, as part ofa column access request. The BA[2:0] lines carry a three-bit bankaddress specifying one of eight banks, while address lines A9 and A7-A2carry a fully-specified, seven-bit column address, “Col Addr1.” Addresslines A1 and A0 carry a two-bit offset select value which is applied toselect one of four column addresses, Coff0-Coff3 to be added to thefully-specified column address. The resulting relative column addressconstitutes the second column address, “Col Addr2,” specified in thecolumn access request. The signal carried on address line A8 indicateswhether a normal precharge or auto-precharge is to be carried out (e.g.,the auto-precharge occurring at the conclusion of the specified columnaccess operation), and the signal carried on address line A10 isreserved. Different signal encodings on the bank address lines andaddress lines or other lines of the request path may be used inalternative embodiments. Also, more or fewer column offsets may bestored to enable a larger selection of column offset values. Forexample, bit A10 may be used to carry the most significant bit of anoffset select value, thereby enabling selection of one of eight columnoffset values.

FIG. 25 illustrates exemplary configuration information that may beprovided in conjunction with a load mode register command issued to thememory device 700 of FIG. 22. The load mode register command may bespecified, for example, by driving the CS, RAS, CAS, and WE lines of therequest path low during a request interval. As shown, the signalscarried on lines BA[2:0] indicate the nature of the operation to beperformed, with ‘000’ and ‘001’ codes indicating that bits A[10:0](i.e., the signals carried on lines A[10:0]) are to be loaded into adevice mode register or extended mode register, respectively, (e.g., toprogram device output latency, burst length and/or other deviceoperating characteristics), codes ‘010-110’ being reserved or used forother functions, and code ‘111’ indicating that bits A[10:0] are to beloaded into a micro-thread mode register (i.e., uMode register). In aload to the micro-thread mode register, bits A9 and A7-A2 form a columnoffset value to be loaded into one of four column offset fields of themicro-thread mode register, and bits A1 and A0 indicate which of thefour column offset fields, Coff0-Coff3, is to be loaded. Bits A8 and A10are coded to one of four values (00, 01, 10, 11) to specify either asingle-threaded mode (ST) within the memory device; a two-by-twomicro-threaded mode (MT2×2) in which a single column address is providedin each micro-threaded column access request to enable two-waypartitioning of the t_(CC) envelope and with the micro-threaded columnaccesses in each t_(RR) interval directed to two different banks (e.g.,to enable micro-threading as described in reference to FIGS. 7-9); afour-by-two micro-threaded mode (MT4×2) in which two column address areprovided in each micro-threaded column access request to enable four-waypartitioning of the t_(CC) envelope and with the micro-threaded columnaccess in each t_(RR) interval directed to two different banks (e.g., asdescribed in reference to FIG. 23); and a four-by-four micro-threadedmode (MT4×4) in which four row activation requests are received pert_(RR) interval to enable each of four dual-address column accessrequests to be directed to a different bank, thereby achieving four-waypartitioning of each t_(CC) interval and enabling four different banksto be accessed in each t_(RR) interval (e.g., as described in referenceto FIG. 17).

FIGS. 26 and 27 illustrate four-by-four micro-threaded memory operationsin a memory device 750 having the data path interfaces 705A and 705Bdescribed in reference to FIGS. 22 and 23 to interface with a legacydata path, and having a request interface 751 that is substantiallysimilar to the request interface 701, except that an additional bankaddress input is provided to receive a fourth bank address bit. By thisarrangement, a sequence of row activation requests specifying a bank ineach of the four quadrants (Q0-Q3) of the memory device may be receivedwithin a single t_(RR) interval, thereby enabling each of fourdual-address column access requests to be directed to a respective oneof the four quadrants. Because unique bank, row and column addresses maybe delivered to the address decoders (i.e., column decoders 703 and rowdecoders 713) for each quadrant, the storage arrays in each quadrant areeffectively converted from sub-banks to banks, thereby yielding asixteen bank architecture having banks B0-B15 as shown in FIG. 26. As inthe other memory device embodiments discussed above, memory device 750may have more or fewer storage arrays in alternative embodimentsyielding correspondingly more or fewer banks. Also, any of the sixteenbanks may include any number of constituent sub-banks.

Referring to FIG. 27, the row activation pipeline 755A is more denselyloaded than in the embodiment described in reference to FIG. 23 todeliver row activation requests 763, 765, 767 and 769 directed to bankswith each of the four quadrants of the memory device 750 in a singlet_(RR) interval. That is, the t_(RR) interval is sub-divided into fourt_(RRp) intervals, with a row activation request being received in each.In the specific example shown in FIGS. 26 and 27, rows ‘w’, ‘y’, ‘x’ and‘z’ are activated one after another in banks B0, B1, B8 and B9, thoughrows may be activated in each of the four quadrants in different orderin subsequent transactions or in alternative embodiments.

Referring to column access pipeline 755B, a predetermined time after thefirst row activation request 763 is received, a pair of dual-addressmicro-threaded column access requests 771, 773 are received one afteranother in a first t_(CC) interval. The first column access request 771is directed to the same bank (B0) as the first row activation request763 and specifies a pair of column locations ‘a’ and ‘e’ (e.g., afully-specified column address and offset select value as discussed inreference to FIGS. 23-25) to be accessed one after another in successivet_(CC) intervals. The second column access request 773 is similarlydirected to the same bank (B1) as the second row activation request 765and specifies a pair of column locations ‘c’ and ‘g’ to be accessed oneafter another in successive t_(CC) intervals. Column data thatcorresponds to the first column address ‘a’ of column access request 771is transferred a predetermined time later over the t_(CCp) intervalstarting at 770 and via the subset of DQ links, DQ[31:16] coupled to thedata path interface 705A for banks B0-B7. Column data that correspondsto the first column address ‘e’ of column access request 773 istransferred via the same DQ link subset, DQ[31:16], over the t_(CCp)interval that starts at 772 (i.e., over the second half of the t_(CC)interval that starts at 770). During the succeeding t_(CC) interval,transfers from the open pages are repeated in respective t_(CCp)intervals to transfer B0, column ‘e’ data and B1, column ‘g’ data. Thus,data for the two column access requests directed to low order banks,B0-B7, are transferred in interleaved fashion (i.e., B0-Ca, B1-Cc,B0-Ce, B1-Cg) in respective t_(CCp) intervals and over a subset of theDQ links. Data for the two column access requests 775, 777 directed tohigh-order banks, B8-B15, are similarly transferred in interleavedfashion (i.e., B8-Cb, B9-Cd, B8-Cf, B9-Ch) in respective t_(CCp)intervals and over the DQ link subset, DQ[15:0]. Overall, the entiredata transfer sequence in response to the column access requests 775,777 occurs over a t_(RR) interval that starts at 772. In the embodimentof FIG. 27, the data transferred in response to column access requests777 and 775 is delayed by a t_(CC) interval relative to the datatransferred in response to column access requests 771, 773 due to thereceipt of the column access requests 777 and 775 one t_(CC) intervalafter requests 771 and 773. In an alternative embodiment, the data to betransferred in response to the earlier-received pair of column accessrequests may be buffered, then output over the same t_(RR) interval asthe data transferred in response to the later-received pair of columnaccess requests. In either case, because each 32-byte t_(CC) envelope isspatially halved and temporally halved to accommodate fourmicro-threaded column access transactions, an 8-byte column transactiongranularity is achieved. Also, because each 64-byte t_(RR) envelope issubdivided to enable data transfer to or from four different banks, asixteen byte row transaction granularity is achieved.

Still referring to FIGS. 26 and 27, the relative addressing schemediscussed in reference to FIGS. 23-25 may be used to convey the secondcolumn address in each of the column access requests 771, 773, 775, 777.Also, because an additional bank address bit is provided via line BA[3]of signal path 756, the operation encoding shown in the bank addressfield of FIG. 25 may be different and/or include additional or differentoperations. Further, because bandwidth for specifying prechargeoperations is consumed by the more densely loaded row request pipeline755A, precharge operations may be specified by the auto-precharge optionindicated in FIG. 24 (i.e., A8=1). Such precharge operations are shownin cross-hatched request intervals in the precharge pipeline of 755C toprovide an example of when such operations are carried out, but arespecified in the corresponding column access requests 771, 773, 775 and777, rather than in explicit precharge requests.

FIG. 28 illustrates an exemplary timing signal arrangement that may beused to convey the fourth bank address bit used in the embodiments ofFIGS. 26 and 27, thereby obviating the added BA signal link and enablingfour-by-four micro-threaded operation using the legacy signal path 730of FIG. 23. In the particular example shown, instead of using afull-frequency timing signal 790 (i.e., clock signal or strobe signal)to time request transfer over the request path, a reduced-frequencytiming signal 792 that exhibits alternating rising and falling edges atthe start of every second request interval is used to convey the leastsignificant bank address bit, BA[0], while the BA[2:0] signal lines areused to convey the most significant bank address bits BA[3:1]. If thefirst quadrant to be accessed in a given t_(RR) interval is an evenquadrant (Q0 or Q2), the timing signal 792 is output with a rising edgethat arrives at the memory device synchronously with respect to thecorresponding row activation request (or column access request) toconvey BA[0]=‘0.’ If the first quadrant to be accessed is an oddquadrant (Q1 or Q3), the timing signal is output with a falling edgethat arrives at the memory device synchronously with respect to thecorresponding row activation request to convey BA[0]=‘1.’ In theembodiment of FIGS. 26 and 27, the least significant bit of the bankaddress toggles with each successive row activation request (or columnaccess request), so that edge of the timing signal that corresponds tothe second row activation request (or column access request) within agiven t_(RR) interval will select the appropriate odd or even bank set;the opposite bank set selected by the edge of the timing signal thatcorresponds to the first row activation request. In the particularexample shown in FIG. 28, a portion of the row request pipeline 755Acontaining row activation requests 763, 765, 767 and 769 is shown inedge alignment with the timing signal 792. The initial rising edgetransition of the timing signal 792 indicates that BA[0] is a 0 so that,by delivering address values BA[3:1]=000 in row activation request 763(i.e., via lines BA[2:0] of the request path), bank B0 is specified bythe row activation request 763. The subsequent falling edge transitionof the timing signal coincides with the arrival of row activationrequest 765 and indicates that BA[0]=1. Accordingly, by deliveringaddress values BA[3:1]=000 in row activation request 765, bank B1 isspecified. Banks B8 and B9 are similarly specified in row activationrequests 767 and 769 by specifying BA[3:1]=100 in conjunction with arising-edge transition and falling-edge transition, respectively, oftiming signal 792. A clock recovery circuit such as phase-locked loop794 may be used to generate an internal timing signal 795 that is phasealigned with transitions of timing signal 792 but having a frequencythat corresponds, for example, to the frequency of signal 790. Theinternal timing signal 795, which may itself be a clock signal or strobesignal, may then be used to control sampling of signals conveyed on therequest path in order to capture a new request in each request interval.

FIG. 29 illustrates an embodiment of a memory system 800 that includes amemory controller 801 and at least one micro-threaded memory device 803.The micro-threaded memory device 803 may be implemented according to anyof the above-described embodiments, but for present purpose is assumedto have at least two data path interfaces, DQA and DQB, for accessingrespective sets of eight storage banks (i.e., DQA is used to transferdata to and from banks B0-B7, and DQB is used to transfer data to andfrom banks B8-B15), and a request interface, RQ, for receiving row andcolumn requests and controlling execution of the requested row andcolumn operations. The storage banks themselves are additionallyorganized in quadrants, Q0-Q3, as described above in reference to memorydevices 100, 530, 700 and 750, though other storage bank organizationsmay be used. The DQA and DQB data path interfaces are coupled to thememory controller via respective subsets of DQ links 802 a and 802 bwithin data path 802, and the request interface is coupled to the memorycontroller via a request path 804. In the embodiment shown, the requestpath 804 is formed by a set of point-to-point links, while the data path802 is formed by multi-drop links (i.e., data path interfaces of one ormore other memory devices may be coupled to the data path or a subset ofthe DQ links thereof). In alternative embodiments, the request path 804may be a multi-drop path coupled to request interfaces of one or moreadditional memory devices (not shown) and/or the data path 802 may beformed by point-to-point links between the memory controller 801 andmemory device 803. Also, the memory device 803 may be one of multiplememory devices disposed on a memory module and coupled to a bufferingcircuit via a set of point-to-point links (or a multi-drop path). Thebuffering circuit may receive requests and/or data directed to any ofthe memory devices on the memory module, and retransmit the requestsand/or data to the target memory device via the corresponding set ofpoint-to-point links.

The memory controller 801 includes a read transaction queue 811 (RTQ),write transaction queue 815 (WTQ), read data buffer 819, queue controllogic 817 and host interface 810. During an initialization orreconfiguration operation, system configuration requests are deliveredto the memory controller 801 which, in turn, programs the memory device803 (including other memory devices if present) to operate in thespecified mode, for example, by issuing programming information viarequest path 804, data path 802 and/or one or more other paths betweenthe memory controller 801 and the memory device 803 (e.g., a sidebandpath, not shown). In one embodiment, for example, the memory controller801 or other device may read a configuration memory associated withmemory device 803 (e.g., a serial presence detect (SPD) or the like) todetermine operating characteristics, constraints and modes of the memorydevice 803 (e.g., in the case of a dual-inline memory module (DIMM) orthe like having the memory device 803 and one or more other like devicesmounted thereto, the tCC, tRR and/or tRC constraints for memory device803 may be recorded within the configuration memory), then pass suchinformation back to a processor or other host. The processor may processsuch information (e.g., as part of basic input-output service (BIOS)code execution), then program the memory controller 801 to establish adesired memory configuration, including instructing the memorycontroller 801 to program the memory device 803. For example, the memorycontroller 801 may be instructed to issue micro-thread-mode register setcommands as described in reference to FIG. 25 to program the memorydevice 803 for single-threaded operating mode, or any of themicro-threaded operating modes described above (e.g., the MT2×2, MT4×2and MT4×4 operating modes described above). The memory controller 801may also include one or more internal configuration registers that areprogrammed in response to instructions received via host interface 810to establish single-threaded control mode or micro-threaded controlmode.

After the memory device 803 and memory controller 801 have beenconfigured (or re-configured in the case where operating modes may beswitched during run-time operation), one or more host devices such as ageneral purposes processor, graphics processor, network processor,and/or direct memory access (DMA) controller may issue memory accessrequests to the memory controller 801, including memory read requests,memory write requests, masked write requests, read-modify-write requestsand so forth. The incoming memory access requests are received in thequeue control logic 817 which, in turn, queues the requests in eitherthe read transaction queue 811 or write transaction queue 815 accordingto whether they specify read or write access.

The read transaction queue (RTQ) 811 includes four sets of read queues,Qr0/2/4/6, Qr1/3/5/7, Qr8/10/12/14 and Qr9/11/13/15 that correspond tofour quadrants of storage banks within the memory device 803. When readrequests are received within the queue control logic 817, the queuecontrol logic 817 determines, based on address information included withthe request, the memory device and storage bank to which the request isdirected and stores the request in the corresponding read queue. Forexample, requests directed to bank 0 of the memory device 803 are storedin read queue Qr0, requests directed to bank 8 are stored in read queueQr8 and so forth. By organizing the read requests within the read queuesin this manner, the memory controller 801 is able to issue rowactivation and column access requests in an order that supportsmicro-threaded memory access transactions within the memory device 803.For example, assuming that each of the four sets of read queues includesat least one memory read request, then the queue control logic 817 mayissue respective enable signals (i.e., EN₁, EN₂, EN₃, EN₄, only one ofwhich is shown in FIG. 29) to first stage multiplexers 823 to controlthe selection of one of the four read queues from each set during agiven t_(RR) interval. The queue control logic 817 additionally issues aselection-enable signal (ENp) to second stage multiplexer 825 to selectone of the four first stage multiplexers 823 to output a read requestfrom a selected read queue during each t_(RRp) interval or, in the caseof column access requests, during each t_(CCp) interval. That is, thequeue control logic 817 may transition the ENp signal from one state toanother at the end of each t_(RRp) or t_(CCp) interval to select anotherof the first stage multiplexers 823, thereby enabling requests to bedirected to each of the four quadrants of the memory device in roundrobin fashion. The first stage multiplexer 825 outputs requests toread/write multiplexer 827 which passes requests from either the readtransaction queue 811 or write transaction queue 815 onto request path804 in response to a control signal (R/W) from the queue control logic817. Read data output from the memory device 803 in response to themicro-threaded read requests are delivered to the memory controller 801via data path 802 and buffered in a read data buffer 819. The queuecontrol logic 817 associates the read data with corresponding host readrequests and outputs the read data to the requesting host device via thehost interface.

The write transaction queue 815 includes four sets of write queues(i.e., Qw0/2/4/6, Qw1/3/5/7, Qw8/10/12/14 and Qw9/11/13/15), first stagemultiplexers 833 and second stage multiplexer 835, each of which operatein generally the same manner as their counterparts in the readtransaction queue 811, except that the write queues store and outputwrite data in addition to write requests. Thus, write requests may beissued to each of the four quadrants of the memory device 803 on event_(RRp) or t_(CCp) intervals to initiate micro-threaded writetransactions therein.

Embodiments in Computer-Readable Media

It should be noted that the various circuits disclosed herein (e.g.,memory devices or component circuits thereof) may be described usingcomputer aided design tools and expressed (or represented), as dataand/or instructions embodied in various computer-readable media, interms of their behavioral, register transfer, logic component,transistor, layout geometries, and/or other characteristics. Formats offiles and other objects in which such circuit expressions may beimplemented include, but are not limited to, formats supportingbehavioral languages such as C, Verilog, and HLDL, formats supportingregister level description languages like RTL, and formats supportinggeometry description languages such as GDSII, GDSIII, GDSIV, CIF, MEBESand any other suitable formats and languages. Computer-readable media inwhich such formatted data and/or instructions may be embodied include,but are not limited to, non-volatile storage media in various forms(e.g., optical, magnetic or semiconductor storage media) and carrierwaves that may be used to transfer such formatted data and/orinstructions through wireless, optical, or wired signaling media or anycombination thereof. Examples of transfers of such formatted data and/orinstructions by carrier waves include, but are not limited to, transfers(uploads, downloads, e-mail, etc.) over the Internet and/or othercomputer networks via one or more data transfer protocols (e.g., HTTP,FTP, SMTP, etc.).

When received within a computer system via one or more computer-readablemedia, such data and/or instruction-based expressions of the abovedescribed circuits may be processed by a processing entity (e.g., one ormore processors) within the computer system in conjunction withexecution of one or more other computer programs including, withoutlimitation, net-list generation programs, place and route programs andthe like, to generate a representation or image of a physicalmanifestation of such circuits. Such representation or image maythereafter be used in device fabrication, for example, by enablinggeneration of one or more masks that are used to form various componentsof the circuits in a device fabrication process.

The section headings provided in this detailed description are forconvenience of reference only, and in no way define, limit, construe ordescribe the scope or extent of such sections. Also, while the inventionhas been described with reference to specific embodiments thereof, itwill be evident that various modifications and changes may be madethereto without departing from the broader spirit and scope of theinvention. The specification and drawings are, accordingly, to beregarded in an illustrative rather than restrictive sense.

1. (canceled)
 2. A memory device comprising: bank groups, each bankgroup having banks of memory; and a command interface to receive, fromthe memory controller, row activation commands to instruct rowactivations and column access commands to instruct column accesses;wherein a first interval that is to transpire between back-to-back rowactivations to banks within different ones of the bank groups, isshorter than a second interval that is to transpire between back-to-backrow activations to banks within a common one of the bank groups, and athird interval that is to transpire between back-to-back column accessesto banks within different ones of the bank groups, is shorter than afourth interval that is to transpire between back-to-back columnaccesses to banks within a common one of the bank groups.
 3. The memorydevice of claim 2, wherein: the memory device further comprisescircuitry to receive a clock signal from a memory controller, the clocksignal having clock transitions; and each of the first interval, thesecond interval, the third interval and the fourth interval are definedby respective numbers of the clock transitions.
 4. The memory device ofclaim 2, wherein: each bank includes a first sub-bank and a secondsub-bank; the first sub-bank is in a first bank group of the bankgroups; and the second sub-bank is in a second bank group of the bankgroups.
 5. The memory device of claim 4, wherein: each sub-bank in thefirst bank group shares first row decoder circuitry and first columndecoder circuitry with each other sub-bank in the first bank group; andeach sub-bank in the second bank group shares second row decodercircuitry and second column decoder circuitry with each other sub-bankin the second bank group.
 6. The memory device of claim 2, furthercomprising circuitry to service the row activation commands and thecolumn activation commands, wherein the circuitry to service the rowactivation commands and the column activation commands is to receive atleast one column access command for each row activation command, inorder to access a column within a row activated by a corresponding rowactivation command, and wherein the command interface is to receive bankaddress information from the memory controller for each row activationcommand and for each column access command, in order to select a bankand in order to select one of the bank groups in the bank groups.
 7. Thememory device of claim 6, wherein the circuitry to service the rowactivation commands and the column activation commands is to receive,for a given row activation command, at least two column access commands,in order to access different columns within a row activated by the givenrow activation command.
 8. The memory device of claim 2, wherein thememory device further comprises a data interface to transfer data withthe memory controller via links, using respective, mutually-exclusivesubsets of the links to transfer data in association with theback-to-back column accesses to banks within different ones of the bankgroups.
 9. The memory device of claim 2, wherein: the memory devicefurther comprises a data interface to transfer data with the memorycontroller via links, in association with each column access command;and the memory device further comprises serialization/deserializationcircuitry to transfer serialized data with the memory controller overeach of the links in association with each column access command. 10.The memory device of claim 2, wherein the memory device furthercomprises a data interface to transfer each of data and write maskvalues with the memory controller, via respective link subsets.
 11. Thememory device of claim 2, wherein the memory device is to receiveinterleaved row activation commands and interleaved column accesscommands for back-to-back data accesses in different ones of the bankgroups, and is to receive bank address information as part of eachcommand of the row activation commands and column access commands, toselect a bank group of the bank groups, and to select a bank within theselected bank group.
 12. The memory device of claim 2, wherein: thefirst interval is longer than the third interval and the second intervalis longer than the fourth interval.
 13. A memory device comprising: bankgroups, each bank group having banks of memory; circuitry to receive aclock signal from a memory controller, the clock signal having clocktransitions; a command interface to receive row activation commands andcolumn access commands from the memory controller, via a command bus; adata interface to transfer data with the memory controller inassociation with each column access command; and circuitry coupled tothe command interface to receive the row activation commands and thecolumn access commands from the command interface and to service the rowactivation commands and the column access commands, such that a firstinterval, defined by a first number of clock transitions to transpirebetween back-to-back row activations to banks within different ones ofthe bank groups, is shorter than a second interval, defined by a secondnumber of clock transitions to transpire between back-to-back rowactivations to banks within a common one of the bank groups, and a thirdinterval, defined by a third number of clock transitions to transpirebetween back-to-back column accesses to banks within different ones ofthe bank groups, is shorter than a fourth interval, defined by a fourthnumber of clock transitions to transpire between back-to-back columnaccesses to banks within a common one of the bank groups.
 14. The memorydevice of claim 13, wherein: each bank includes a first sub-bank and asecond sub-bank; the first sub-bank is in a first bank group of the bankgroups; and the second sub-bank is in a second bank group of the bankgroups.
 15. The memory device of claim 14, wherein: each sub-bank in thefirst bank group shares first row decoder circuitry and first columndecoder circuitry with each other sub-bank in the first bank group; andeach sub-bank in the second bank group shares second row decodercircuitry and second column decoder circuitry with each other sub-bankin the second bank group.
 16. The memory device of claim 14, wherein thecircuitry to receive and service is to receive at least one columnaccess command for each row activation command, in order to access acolumn within a row activated by a corresponding row activation command.17. The memory device of claim 13, wherein the memory device alsocomprises a data interface to transfer data with the memory controllervia links, using respective, mutually-exclusive subsets of the links totransfer data in association with the back-to-back column accesses tobanks within different ones of the bank groups.
 18. The memory device ofclaim 12, wherein: the memory device further comprises a data interfaceto transfer data with the memory controller via links, in associationwith each column access command; and the memory device further comprisesserialization/deserialization circuitry to transfer serialized data withthe memory controller over each of the links in association with eachcolumn access command.
 19. The memory device of claim 12, wherein thememory device further comprises a data interface to transfer each ofdata and write mask values, with the memory controller, via respectivelink subsets.
 20. A method of operation in a memory device, the memorydevice having bank groups, each bank group having banks of memory, themethod comprising: receiving a clock signal from a memory controller,the clock signal having clock transitions; receiving from the memorycontroller, via a command interface, row activation commands to instructrespective row activations and column access commands to instructrespective column accesses, each row activation commands to specify arespective one of the bank groups for the row activations, and eachcolumn access command to specify a respective one of the bank groups foreach column access, such that a first interval, defined by a firstnumber of clock transitions to transpire between back-to-back rowactivations to banks within different ones of the bank groups, isshorter than a second interval, defined by a second number of clocktransitions to transpire between back-to-back row activations to bankswithin a common one of the bank groups, and a third interval, defined bya third number of clock transitions to transpire between back-to-backcolumn accesses to banks within different ones of the bank groups, islonger than a fourth interval, defined by a fourth number of clocktransitions to transpire between back-to-back column accesses to bankswithin a common one of the bank groups.
 21. The method of claim 20,wherein: each bank includes a first sub-bank and a second sub-bank; thefirst sub-bank is in a first bank group of the bank groups; the secondsub-bank is in a second bank group of the bank groups; each sub-bank inthe first bank group shares row decoder circuitry and column decodercircuitry with each other sub-bank in the first bank group; and eachsub-bank in the second bank group shares row decoder circuitry andcolumn decoder circuitry with each other sub-bank in the second bankgroup.
 22. The method of claim 20, wherein: the method further comprisesreceiving bank address information as part of each command of the rowactivation commands and column access commands, to receive memorycontroller communication of selection of a bank group of the bankgroups, and to receive memory controller selection of a bank within theselected bank group.